You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Petersen <ro...@buy.com> on 2011/07/16 00:43:54 UTC

Analysis page output vs. actually getting search matches, a discrepency?

I a problem searching for one mfg name (out of our 10mm product titles)
and it is indexed in a text type field  having about the same analyzer
settings as the solr example text field definition, and most everything
works fine but we found this one example which I cannot get a direct hit
on.  In the Field Analysis page, It sure looks like it would *have* to
match but sadly during searches it just doesn't.  I can get it to match
by turning off 'split on case change' but that breaks many other
searches like 'appleTV' which need to split on case change to match
'apple tv' in our content!

 

If I search for SterlingTek's <anything> I get zero results.

If I change the casing to Sterlingtek's in my query, I get all the
results.

If I turn off 'split on case change then the first gets results also.

 

See verbose analysis output to see actual filter settings, I put
non-verbose first for easier reading (hope the tables don't get lost
during posting to this group) but the analysis shows complete matchup,
that is what I don't get:

 

Field Analysis

Top of Form

Field                

Field value (Index) 
verbose output  
highlight matches 

SterlingTek's NB-2LH

Field value (Query) 
verbose output 

SterlingTek's NB-2LH

	Bottom of Form

Index Analyzer

SterlingTek's

NB-2LH

 

SterlingTek's

NB-2LH

 

SterlingTek's

NB-2LH

 

Sterling

Tek

NB

2

LH

SterlingTek

 

sterling

tek

nb

2

lh

sterlingtek

 

sterling

tek

nb

2

lh

sterlingtek

 

sterling

tek

nb

2

lh

	
sterlingtek

Note every field is highlighted in the last line above meaning all have
a match, right???

Query Analyzer

SterlingTek's

NB-2LH

 

SterlingTek's

NB-2LH

 

SterlingTek's

NB-2LH

 

Sterling

Tek

NB

2

LH

 

sterling

tek

nb

2

lh

 

sterling

tek

nb

2

lh

 

sterling

tek

nb

2

lh

 

 

VERBOSE OUTPUT FOLLOWS:


Index Analyzer


org.apache.solr.analysis.WhitespaceTokenizerFactory {}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload

		

org.apache.solr.analysis.SynonymFilterFactory
{synonyms=index_synonyms.txt, expand=true, ignoreCase=true}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload

		

org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload

		

org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
splitOnCaseChange=1, generateNumberParts=1, catenateWords=1,
generateWordParts=1, catenateAll=0, catenateNumbers=1}

term position

1

2

3

4

5

term text

Sterling

Tek

NB

2

LH

SterlingTek

term type

word

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

0,11

payload

					
	

org.apache.solr.analysis.LowerCaseFilterFactory {}

term position

1

2

3

4

5

term text

sterling

tek

nb

2

lh

sterlingtek

term type

word

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

0,11

payload

					
	

com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
{protected=protwords.txt}

term position

1

2

3

4

5

term text

sterling

tek

nb

2

lh

sterlingtek

term type

word

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

0,11

payload

					
	

org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}

term position

1

2

3

4

5

term text

sterling

tek

nb

2

lh

sterlingtek

term type

word

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

0,11

payload

					
	

Query Analyzer


org.apache.solr.analysis.WhitespaceTokenizerFactory {}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload

		

org.apache.solr.analysis.SynonymFilterFactory
{synonyms=query_synonyms.txt, expand=true, ignoreCase=true}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload

		

org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload

		

org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
splitOnCaseChange=1, generateNumberParts=1, catenateWords=0,
generateWordParts=1, catenateAll=0, catenateNumbers=0}

term position

1

2

3

4

5

term text

Sterling

Tek

NB

2

LH

term type

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

payload

					

org.apache.solr.analysis.LowerCaseFilterFactory {}

term position

1

2

3

4

5

term text

sterling

tek

nb

2

lh

term type

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

payload

					

com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
{protected=protwords.txt}

term position

1

2

3

4

5

term text

sterling

tek

nb

2

lh

term type

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

payload

					

org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}

term position

1

2

3

4

5

term text

sterling

tek

nb

2

lh

term type

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

payload

					

 


RE: Analysis page output vs. actually getting search matches, a discrepency?

Posted by Robert Petersen <ro...@buy.com>.
Um sorry for any confusion.  I meant to say I solved my issue by inserting a charFilter before the WhitespaceTokenizerFactory to convert my problem word to a searchable form.  I had a cut n paste malfunction below.  Thanks guys.

-----Original Message-----
From: Robert Petersen [mailto:robertpe@buy.com] 
Sent: Tuesday, July 19, 2011 11:06 AM
To: solr-user@lucene.apache.org
Subject: RE: Analysis page output vs. actually getting search matches, a discrepency?

Thanks Eric,

Unfortunately I'm stemming the same on both sides, similar to the SOLR example settings for the text type field.  Default search field is moreWords, as I want yes.

Since I don't have this problem for any other mfg names at all in our index of almost 10 mm product docs, and this shows that is should match in my best estimation.

Note:  LucidKStemFilterFactory does not take 'Sterling' down to 'Sterl' in indexing nor searching, it stays as 'Sterling'.

I have given up on this.  I've decided it is just an unexplainable anomaly, and have solved it by inserting a LucidKStemFilterFactory and just modifying that word to it's searchable form before hitting the WhitespaceTokenizerFactory, which is kind of hackish but solves my problem at least.  This seller only has a couple hundred cheap products on our site, so I have bigger fish to fry at this point.  I've wasted too much time trying to chase this down.

Cheers all
Robi

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, July 18, 2011 5:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Analysis page output vs. actually getting search matches, a discrepency?

Hmmm, is there any chance that you're stemming one place and
not the other?
And I infer from your output that your default search field is
"moreWords", is that true and expected?

You might use luke or the TermsComponent to see what's actually in
the index, I'm going to guess that you'll find "sterl" but not "sterling" as
an indexed term and your problem is stemming, but that's
a shot in the dark.....

Best
Erick

On Mon, Jul 18, 2011 at 5:37 PM, Robert Petersen <ro...@buy.com> wrote:
> OK I did what Hoss said, it only confirms I don't get a match when I
> should and that the query parser is doing the expected.  Here are the
> details for one test sku.
>
> My analysis page output is shown in my email starting this thread and
> here is my query debug output.  This absolutely should match but
> doesn't.  Both the indexing side and the query side are splitting on
> case changes.  This actually isn't a problem for any of our other
> content, for instance there is no issue searching for 'VideoSecu'.
> Their products come up fine in our searches regardless of casing in the
> query.  Only SterlingTek's products seem to be causing us issues.
>
> Indexed content has camel case, stored in the text field 'moreWords':
> "SterlingTek's NB-2LH 2 Pack Batteries + Charger Combo for Canon DC301"
> Search term not matching with camel case: "SterlingTek's"
> Search term matching if no case changes: "Sterlingtek's"
>
> Indexing:
> <filter class="solr.WordDelimiterFilterFactory"
>        generateWordParts="1"
>        generateNumberParts="1"
>        catenateWords="1"
>        catenateNumbers="1"
>        catenateAll="0"
>        splitOnCaseChange="1"
>        preserveOriginal="0"
> />
> Searching:
> <filter class="solr.WordDelimiterFilterFactory"
>         generateWordParts="1"
>         generateNumberParts="1"
>         catenateWords="0"
>         catenateNumbers="0"
>         catenateAll="0"
>         splitOnCaseChange="1"
>         preserveOriginal="0"
> />
>
> Thanks
>
> http://ssdevrh01.buy.com:8983/solr/10000/select?indent=on&version=2.2&q=
> SterlingTek%27s&fq=&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&
> debugQuery=on&explainOther=sku%3A216473417&hl=on&hl.fl=&echoHandler=true
> &adf
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">4</int>
> <str
> name="handler">org.apache.solr.handler.component.SearchHandler</str>
> <lst name="params">
>  <str name="explainOther">sku:216473417</str>
>  <str name="indent">on</str>
>  <str name="echoHandler">true</str>
>  <str name="hl.fl"/>
>  <str name="wt">standard</str>
>  <str name="hl">on</str>
>  <str name="rows">1</str>
>  <str name="version">2.2</str>
>  <str name="fl">*,score</str>
>  <str name="debugQuery">on</str>
>  <str name="start">0</str>
>  <str name="q">SterlingTek's</str>
>  <str name="qt">standard</str>
>  <str name="fq"/>
> </lst>
> </lst>
> <result name="response" numFound="0" start="0" maxScore="0.0"/>
> <lst name="highlighting"/>
> <lst name="debug">
> <str name="rawquerystring">SterlingTek's</str>
> <str name="querystring">SterlingTek's</str>
> <str name="parsedquery">PhraseQuery(moreWords:"sterling tek")</str>
> <str name="parsedquery_toString">moreWords:"sterling tek"</str>
> <lst name="explain"/>
> <str name="otherQuery">sku:216473417</str>
> <lst name="explainOther">
> <str name="216473417">
> 0.0 = fieldWeight(moreWords:"sterling tek" in 76351), product of:
>  0.0 = tf(phraseFreq=0.0)
>  19.502613 = idf(moreWords: sterling=1 tek=72)
>  0.15625 = fieldNorm(field=moreWords, doc=76351)
>
> </str>
> </lst>
> <str name="QParser">LuceneQParser
> </str>
> <arr name="filter_queries">
> <str/>
> </arr>
>
>
>
> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: Friday, July 15, 2011 4:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Analysis page output vs. actually getting search matches, a
> discrepency?
>
>
> : Subject: Analysis page output vs. actually getting search matches,
> :     a discrepency?
>
> 99% of the time when people ask questions like this, it's because of
> confusion about how/when QueryParsing comes into play (as opposed to
> analysis) -- analysis.jsp only shows you part of the equation, it
> doesn't
> know what query parser you are using.
>
> you mentioned that you aren't getting matches when you expect them, and
> you provided the analysis.jsp output, but you didn't mention anything
> about the request you are making, the query parser used etc....  it
> owuld
> be good to know the full query URL, along with the debugQuery output
> showing the final query toString info.
>
> if that info doesn't clear up the discrepency, you should also take a
> look
> at the explainOther info for the doc that you expect to match that isn't
>
> -- if you still aren't sure what's going on, post all of that info to
> solr-user and folks can probably help you make sense of it.
>
> (all that said: in some instances this type of problem is simply that
> someone changed the schema and didn't reindex everything, so the indexed
>
> terms don't really match what you think they do)
>
>
> -Hoss
>

RE: Analysis page output vs. actually getting search matches, a discrepency?

Posted by Robert Petersen <ro...@buy.com>.
Thanks Eric,

Unfortunately I'm stemming the same on both sides, similar to the SOLR example settings for the text type field.  Default search field is moreWords, as I want yes.

Since I don't have this problem for any other mfg names at all in our index of almost 10 mm product docs, and this shows that is should match in my best estimation.

Note:  LucidKStemFilterFactory does not take 'Sterling' down to 'Sterl' in indexing nor searching, it stays as 'Sterling'.

I have given up on this.  I've decided it is just an unexplainable anomaly, and have solved it by inserting a LucidKStemFilterFactory and just modifying that word to it's searchable form before hitting the WhitespaceTokenizerFactory, which is kind of hackish but solves my problem at least.  This seller only has a couple hundred cheap products on our site, so I have bigger fish to fry at this point.  I've wasted too much time trying to chase this down.

Cheers all
Robi

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, July 18, 2011 5:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Analysis page output vs. actually getting search matches, a discrepency?

Hmmm, is there any chance that you're stemming one place and
not the other?
And I infer from your output that your default search field is
"moreWords", is that true and expected?

You might use luke or the TermsComponent to see what's actually in
the index, I'm going to guess that you'll find "sterl" but not "sterling" as
an indexed term and your problem is stemming, but that's
a shot in the dark.....

Best
Erick

On Mon, Jul 18, 2011 at 5:37 PM, Robert Petersen <ro...@buy.com> wrote:
> OK I did what Hoss said, it only confirms I don't get a match when I
> should and that the query parser is doing the expected.  Here are the
> details for one test sku.
>
> My analysis page output is shown in my email starting this thread and
> here is my query debug output.  This absolutely should match but
> doesn't.  Both the indexing side and the query side are splitting on
> case changes.  This actually isn't a problem for any of our other
> content, for instance there is no issue searching for 'VideoSecu'.
> Their products come up fine in our searches regardless of casing in the
> query.  Only SterlingTek's products seem to be causing us issues.
>
> Indexed content has camel case, stored in the text field 'moreWords':
> "SterlingTek's NB-2LH 2 Pack Batteries + Charger Combo for Canon DC301"
> Search term not matching with camel case: "SterlingTek's"
> Search term matching if no case changes: "Sterlingtek's"
>
> Indexing:
> <filter class="solr.WordDelimiterFilterFactory"
>        generateWordParts="1"
>        generateNumberParts="1"
>        catenateWords="1"
>        catenateNumbers="1"
>        catenateAll="0"
>        splitOnCaseChange="1"
>        preserveOriginal="0"
> />
> Searching:
> <filter class="solr.WordDelimiterFilterFactory"
>         generateWordParts="1"
>         generateNumberParts="1"
>         catenateWords="0"
>         catenateNumbers="0"
>         catenateAll="0"
>         splitOnCaseChange="1"
>         preserveOriginal="0"
> />
>
> Thanks
>
> http://ssdevrh01.buy.com:8983/solr/10000/select?indent=on&version=2.2&q=
> SterlingTek%27s&fq=&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&
> debugQuery=on&explainOther=sku%3A216473417&hl=on&hl.fl=&echoHandler=true
> &adf
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">4</int>
> <str
> name="handler">org.apache.solr.handler.component.SearchHandler</str>
> <lst name="params">
>  <str name="explainOther">sku:216473417</str>
>  <str name="indent">on</str>
>  <str name="echoHandler">true</str>
>  <str name="hl.fl"/>
>  <str name="wt">standard</str>
>  <str name="hl">on</str>
>  <str name="rows">1</str>
>  <str name="version">2.2</str>
>  <str name="fl">*,score</str>
>  <str name="debugQuery">on</str>
>  <str name="start">0</str>
>  <str name="q">SterlingTek's</str>
>  <str name="qt">standard</str>
>  <str name="fq"/>
> </lst>
> </lst>
> <result name="response" numFound="0" start="0" maxScore="0.0"/>
> <lst name="highlighting"/>
> <lst name="debug">
> <str name="rawquerystring">SterlingTek's</str>
> <str name="querystring">SterlingTek's</str>
> <str name="parsedquery">PhraseQuery(moreWords:"sterling tek")</str>
> <str name="parsedquery_toString">moreWords:"sterling tek"</str>
> <lst name="explain"/>
> <str name="otherQuery">sku:216473417</str>
> <lst name="explainOther">
> <str name="216473417">
> 0.0 = fieldWeight(moreWords:"sterling tek" in 76351), product of:
>  0.0 = tf(phraseFreq=0.0)
>  19.502613 = idf(moreWords: sterling=1 tek=72)
>  0.15625 = fieldNorm(field=moreWords, doc=76351)
>
> </str>
> </lst>
> <str name="QParser">LuceneQParser
> </str>
> <arr name="filter_queries">
> <str/>
> </arr>
>
>
>
> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: Friday, July 15, 2011 4:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Analysis page output vs. actually getting search matches, a
> discrepency?
>
>
> : Subject: Analysis page output vs. actually getting search matches,
> :     a discrepency?
>
> 99% of the time when people ask questions like this, it's because of
> confusion about how/when QueryParsing comes into play (as opposed to
> analysis) -- analysis.jsp only shows you part of the equation, it
> doesn't
> know what query parser you are using.
>
> you mentioned that you aren't getting matches when you expect them, and
> you provided the analysis.jsp output, but you didn't mention anything
> about the request you are making, the query parser used etc....  it
> owuld
> be good to know the full query URL, along with the debugQuery output
> showing the final query toString info.
>
> if that info doesn't clear up the discrepency, you should also take a
> look
> at the explainOther info for the doc that you expect to match that isn't
>
> -- if you still aren't sure what's going on, post all of that info to
> solr-user and folks can probably help you make sense of it.
>
> (all that said: in some instances this type of problem is simply that
> someone changed the schema and didn't reindex everything, so the indexed
>
> terms don't really match what you think they do)
>
>
> -Hoss
>

Re: Analysis page output vs. actually getting search matches, a discrepency?

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, is there any chance that you're stemming one place and
not the other?
And I infer from your output that your default search field is
"moreWords", is that true and expected?

You might use luke or the TermsComponent to see what's actually in
the index, I'm going to guess that you'll find "sterl" but not "sterling" as
an indexed term and your problem is stemming, but that's
a shot in the dark.....

Best
Erick

On Mon, Jul 18, 2011 at 5:37 PM, Robert Petersen <ro...@buy.com> wrote:
> OK I did what Hoss said, it only confirms I don't get a match when I
> should and that the query parser is doing the expected.  Here are the
> details for one test sku.
>
> My analysis page output is shown in my email starting this thread and
> here is my query debug output.  This absolutely should match but
> doesn't.  Both the indexing side and the query side are splitting on
> case changes.  This actually isn't a problem for any of our other
> content, for instance there is no issue searching for 'VideoSecu'.
> Their products come up fine in our searches regardless of casing in the
> query.  Only SterlingTek's products seem to be causing us issues.
>
> Indexed content has camel case, stored in the text field 'moreWords':
> "SterlingTek's NB-2LH 2 Pack Batteries + Charger Combo for Canon DC301"
> Search term not matching with camel case: "SterlingTek's"
> Search term matching if no case changes: "Sterlingtek's"
>
> Indexing:
> <filter class="solr.WordDelimiterFilterFactory"
>        generateWordParts="1"
>        generateNumberParts="1"
>        catenateWords="1"
>        catenateNumbers="1"
>        catenateAll="0"
>        splitOnCaseChange="1"
>        preserveOriginal="0"
> />
> Searching:
> <filter class="solr.WordDelimiterFilterFactory"
>         generateWordParts="1"
>         generateNumberParts="1"
>         catenateWords="0"
>         catenateNumbers="0"
>         catenateAll="0"
>         splitOnCaseChange="1"
>         preserveOriginal="0"
> />
>
> Thanks
>
> http://ssdevrh01.buy.com:8983/solr/10000/select?indent=on&version=2.2&q=
> SterlingTek%27s&fq=&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&
> debugQuery=on&explainOther=sku%3A216473417&hl=on&hl.fl=&echoHandler=true
> &adf
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">4</int>
> <str
> name="handler">org.apache.solr.handler.component.SearchHandler</str>
> <lst name="params">
>  <str name="explainOther">sku:216473417</str>
>  <str name="indent">on</str>
>  <str name="echoHandler">true</str>
>  <str name="hl.fl"/>
>  <str name="wt">standard</str>
>  <str name="hl">on</str>
>  <str name="rows">1</str>
>  <str name="version">2.2</str>
>  <str name="fl">*,score</str>
>  <str name="debugQuery">on</str>
>  <str name="start">0</str>
>  <str name="q">SterlingTek's</str>
>  <str name="qt">standard</str>
>  <str name="fq"/>
> </lst>
> </lst>
> <result name="response" numFound="0" start="0" maxScore="0.0"/>
> <lst name="highlighting"/>
> <lst name="debug">
> <str name="rawquerystring">SterlingTek's</str>
> <str name="querystring">SterlingTek's</str>
> <str name="parsedquery">PhraseQuery(moreWords:"sterling tek")</str>
> <str name="parsedquery_toString">moreWords:"sterling tek"</str>
> <lst name="explain"/>
> <str name="otherQuery">sku:216473417</str>
> <lst name="explainOther">
> <str name="216473417">
> 0.0 = fieldWeight(moreWords:"sterling tek" in 76351), product of:
>  0.0 = tf(phraseFreq=0.0)
>  19.502613 = idf(moreWords: sterling=1 tek=72)
>  0.15625 = fieldNorm(field=moreWords, doc=76351)
>
> </str>
> </lst>
> <str name="QParser">LuceneQParser
> </str>
> <arr name="filter_queries">
> <str/>
> </arr>
>
>
>
> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: Friday, July 15, 2011 4:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Analysis page output vs. actually getting search matches, a
> discrepency?
>
>
> : Subject: Analysis page output vs. actually getting search matches,
> :     a discrepency?
>
> 99% of the time when people ask questions like this, it's because of
> confusion about how/when QueryParsing comes into play (as opposed to
> analysis) -- analysis.jsp only shows you part of the equation, it
> doesn't
> know what query parser you are using.
>
> you mentioned that you aren't getting matches when you expect them, and
> you provided the analysis.jsp output, but you didn't mention anything
> about the request you are making, the query parser used etc....  it
> owuld
> be good to know the full query URL, along with the debugQuery output
> showing the final query toString info.
>
> if that info doesn't clear up the discrepency, you should also take a
> look
> at the explainOther info for the doc that you expect to match that isn't
>
> -- if you still aren't sure what's going on, post all of that info to
> solr-user and folks can probably help you make sense of it.
>
> (all that said: in some instances this type of problem is simply that
> someone changed the schema and didn't reindex everything, so the indexed
>
> terms don't really match what you think they do)
>
>
> -Hoss
>

RE: Analysis page output vs. actually getting search matches, a discrepency?

Posted by Robert Petersen <ro...@buy.com>.
OK I did what Hoss said, it only confirms I don't get a match when I
should and that the query parser is doing the expected.  Here are the
details for one test sku.

My analysis page output is shown in my email starting this thread and
here is my query debug output.  This absolutely should match but
doesn't.  Both the indexing side and the query side are splitting on
case changes.  This actually isn't a problem for any of our other
content, for instance there is no issue searching for 'VideoSecu'.
Their products come up fine in our searches regardless of casing in the
query.  Only SterlingTek's products seem to be causing us issues.

Indexed content has camel case, stored in the text field 'moreWords':
"SterlingTek's NB-2LH 2 Pack Batteries + Charger Combo for Canon DC301"
Search term not matching with camel case: "SterlingTek's"
Search term matching if no case changes: "Sterlingtek's"

Indexing:
<filter class="solr.WordDelimiterFilterFactory"
	generateWordParts="1"
	generateNumberParts="1"
	catenateWords="1"
	catenateNumbers="1"
	catenateAll="0"
	splitOnCaseChange="1"
	preserveOriginal="0"
/>
Searching:
<filter class="solr.WordDelimiterFilterFactory"
	 generateWordParts="1"
	 generateNumberParts="1"
	 catenateWords="0"
	 catenateNumbers="0"
	 catenateAll="0"
	 splitOnCaseChange="1"
	 preserveOriginal="0"
/>

Thanks

http://ssdevrh01.buy.com:8983/solr/10000/select?indent=on&version=2.2&q=
SterlingTek%27s&fq=&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&
debugQuery=on&explainOther=sku%3A216473417&hl=on&hl.fl=&echoHandler=true
&adf

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">4</int>
<str
name="handler">org.apache.solr.handler.component.SearchHandler</str>
<lst name="params">
 <str name="explainOther">sku:216473417</str> 
 <str name="indent">on</str>
 <str name="echoHandler">true</str>
 <str name="hl.fl"/>
 <str name="wt">standard</str>
 <str name="hl">on</str>
 <str name="rows">1</str>
 <str name="version">2.2</str>
 <str name="fl">*,score</str>
 <str name="debugQuery">on</str>
 <str name="start">0</str>
 <str name="q">SterlingTek's</str>
 <str name="qt">standard</str>
 <str name="fq"/>
</lst>
</lst>
<result name="response" numFound="0" start="0" maxScore="0.0"/>
<lst name="highlighting"/>
<lst name="debug">
<str name="rawquerystring">SterlingTek's</str>
<str name="querystring">SterlingTek's</str>
<str name="parsedquery">PhraseQuery(moreWords:"sterling tek")</str>
<str name="parsedquery_toString">moreWords:"sterling tek"</str>
<lst name="explain"/>
<str name="otherQuery">sku:216473417</str>
<lst name="explainOther">
<str name="216473417">
0.0 = fieldWeight(moreWords:"sterling tek" in 76351), product of:
  0.0 = tf(phraseFreq=0.0)
  19.502613 = idf(moreWords: sterling=1 tek=72)
  0.15625 = fieldNorm(field=moreWords, doc=76351)

</str>
</lst>
<str name="QParser">LuceneQParser
</str>
<arr name="filter_queries">
<str/>
</arr>



-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Friday, July 15, 2011 4:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Analysis page output vs. actually getting search matches, a
discrepency?


: Subject: Analysis page output vs. actually getting search matches,
:     a discrepency?

99% of the time when people ask questions like this, it's because of 
confusion about how/when QueryParsing comes into play (as opposed to 
analysis) -- analysis.jsp only shows you part of the equation, it
doesn't 
know what query parser you are using.

you mentioned that you aren't getting matches when you expect them, and 
you provided the analysis.jsp output, but you didn't mention anything 
about the request you are making, the query parser used etc....  it
owuld 
be good to know the full query URL, along with the debugQuery output 
showing the final query toString info.

if that info doesn't clear up the discrepency, you should also take a
look 
at the explainOther info for the doc that you expect to match that isn't

-- if you still aren't sure what's going on, post all of that info to 
solr-user and folks can probably help you make sense of it.

(all that said: in some instances this type of problem is simply that 
someone changed the schema and didn't reindex everything, so the indexed

terms don't really match what you think they do)


-Hoss

RE: Analysis page output vs. actually getting search matches, a discrepency?

Posted by Robert Petersen <ro...@buy.com>.
Hi Chris, 

Well to start from the bottom of your list there, I restrict my testing
to one sku while continuously reindexing the sku after every indexer
side change, and reload the core every time also.  I just search from
the admin page using the word in question and the exact match on the sku
field (the unique one) like this:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">6</int>
<lst name="params">
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">SterlingTek's NB-2LH sku:216473417</str>
<str name="bbb">aaaaa</str>
<str name="rows">10</str>
<str name="version">2.2</str>
</lst>
</lst>

I will have to find out more about query parsers before I can answer the
rest, Will reply to that later... and it's Friday after all!  :)

Thanks


-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Friday, July 15, 2011 4:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Analysis page output vs. actually getting search matches, a
discrepency?


: Subject: Analysis page output vs. actually getting search matches,
:     a discrepency?

99% of the time when people ask questions like this, it's because of 
confusion about how/when QueryParsing comes into play (as opposed to 
analysis) -- analysis.jsp only shows you part of the equation, it
doesn't 
know what query parser you are using.

you mentioned that you aren't getting matches when you expect them, and 
you provided the analysis.jsp output, but you didn't mention anything 
about the request you are making, the query parser used etc....  it
owuld 
be good to know the full query URL, along with the debugQuery output 
showing the final query toString info.

if that info doesn't clear up the discrepency, you should also take a
look 
at the explainOther info for the doc that you expect to match that isn't

-- if you still aren't sure what's going on, post all of that info to 
solr-user and folks can probably help you make sense of it.

(all that said: in some instances this type of problem is simply that 
someone changed the schema and didn't reindex everything, so the indexed

terms don't really match what you think they do)


-Hoss

Re: Analysis page output vs. actually getting search matches, a discrepency?

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Analysis page output vs. actually getting search matches,
:     a discrepency?

99% of the time when people ask questions like this, it's because of 
confusion about how/when QueryParsing comes into play (as opposed to 
analysis) -- analysis.jsp only shows you part of the equation, it doesn't 
know what query parser you are using.

you mentioned that you aren't getting matches when you expect them, and 
you provided the analysis.jsp output, but you didn't mention anything 
about the request you are making, the query parser used etc....  it owuld 
be good to know the full query URL, along with the debugQuery output 
showing the final query toString info.

if that info doesn't clear up the discrepency, you should also take a look 
at the explainOther info for the doc that you expect to match that isn't 
-- if you still aren't sure what's going on, post all of that info to 
solr-user and folks can probably help you make sense of it.

(all that said: in some instances this type of problem is simply that 
someone changed the schema and didn't reindex everything, so the indexed 
terms don't really match what you think they do)


-Hoss