You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Petersen <ro...@buy.com> on 2011/04/22 02:06:56 UTC

term position question from analyzer stack for WordDelimiterFilterFactory

So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory settings I cannot get a match between AppleTV on the indexing side and appletv on the search side.  Without that setting the all lowercase version of AppleTV is in term position two due to the catenateWords=1 or the catenateAll=1 settings.  I am surprised.  How does term position affect searching?  Here is my analysis with preserveOriginal=1 to make the lower case occur in both term position 1 and 2:

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position 	1
term text 	AppleTV
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.SynonymFilterFactory {synonyms=index_synonyms.txt, expand=true, ignoreCase=true}
term position 	1
term text 	AppleTV
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true}
term position 	1
term text 	AppleTV
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1, generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=1, catenateNumbers=1}
term position 	1	2
term text 	AppleTV	TV
		Apple		AppleTV
term type 	word		word
word	word
source start,end 	0,7	5,7
0,5	0,7
payload 		
	
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 	1	2
term text 	appletv	tv
		apple		appletv
term type 	word		word
word	word
source start,end 	0,7	5,7
0,5	0,7
payload 		
	
com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory {protected=protwords.txt}
term position 	1	2
term text 	appletv	tv
		apple		appletv
term type 	word	word
word	word
source start,end 	0,7	5,7
0,5	0,7
payload 		
	
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 	1	2
term text 	appletv	tv
    		apple		appletv
term type 	word		word
word	word
source start,end 	0,7	5,7
0,5	0,7
payload 		
	
Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.SynonymFilterFactory {synonyms=query_synonyms.txt, expand=true, ignoreCase=true}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1, generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=1, catenateNumbers=1}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory {protected=protwords.txt}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 	1
term text 	appletv
term type 	word
source start,end 	0,7
payload 	

Re: term position question from analyzer stack for WordDelimiterFilterFactory

Posted by Erick Erickson <er...@gmail.com>.
I second Otis' comments. Is it possible that you've gotten twisted
around by trying to modify these settings and would be better off
going back to the WDDF settings in the example schema? I've
sometimes found that to be very useful.

Also (although I don't think it applies in this case) be aware that
the analysis page may introduce it's own errors, so when you see
something really wonky, try a query with &debugQuery=on and see
if the parsed query squares with the results on the analysis page...

 Best
Erick

On Tue, Apr 26, 2011 at 5:44 PM, Robert Petersen <ro...@buy.com> wrote:
> Yeah I am about to try turning one on at a time and see what happens.  I
> had a meeting so couldn't do it yet...  (darn those meetings)  (lol)
>
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Tuesday, April 26, 2011 2:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: term position question from analyzer stack for
> WordDelimiterFilterFactory
>
> Hi Robert,
>
> I'm no WDFF expert, but all these zero look suspicious:
>
> org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
> generateNumberParts=0, catenateWords=0, generateWordParts=0,
> catenateAll=0, catenateNumbers=0}
>
> A quick visit to
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel
> imiterFilterFactory
>  makes me think you want:
>
> splitOnCaseChange=1  (if you want Mc Afee for some reason?)
> generateWordParts=1 (if you want Mc Afee for some reason?)
> preserveOriginal=1
>
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
>> From: Robert Petersen <ro...@buy.com>
>> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
>> Sent: Tue, April 26, 2011 4:39:49 PM
>> Subject: RE: term position question from analyzer stack for
>>WordDelimiterFilterFactory
>>
>> OK this is even more weird... everything is working much better except
>> for  one thing: I was testing use cases with our top query terms to
> make
>> sure the  below query settings wouldn't break any existing behavior,
> and
>> got this most  unusual result.  The analyzer stack completely
> eliminated
>> the word  McAfee from the query terms!  I'm like huh?  Here is the
>> analyzer  page output for that search term:
>>
>> Query  Analyzer
>> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>> term  position     1
>> term text     McAfee
>> term  type     word
>> source start,end      0,6
>> payload
>> org.apache.solr.analysis.SynonymFilterFactory
>> {synonyms=query_synonyms.txt,  expand=true, ignoreCase=true}
>> term position     1
>> term  text     McAfee
>> term type     word
>> source  start,end     0,6
>> payload
>> org.apache.solr.analysis.StopFilterFactory  {words=stopwords.txt,
>> ignoreCase=true}
>> term position      1
>> term text     McAfee
>> term type      word
>> source start,end     0,6
>> payload
>> org.apache.solr.analysis.WordDelimiterFilterFactory
> {preserveOriginal=0,
>> generateNumberParts=0, catenateWords=0,  generateWordParts=0,
>> catenateAll=0, catenateNumbers=0}
>> term  position
>> term text
>> term type
>> source  start,end
>> payload
>> org.apache.solr.analysis.LowerCaseFilterFactory  {}
>> term position
>> term text
>> term type
>> source  start,end
>> payload
>> com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
>> {protected=protwords.txt}
>> term  position
>> term text
>> term type
>> source  start,end
>> payload
>> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory  {}
>> term position
>> term text
>> term type
>> source  start,end
>> payload
>>
>>
>>
>> -----Original Message-----
>> From: Robert  Petersen [mailto:robertpe@buy.com]
>> Sent: Monday, April 25,  2011 11:27 AM
>> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
>> Subject:  RE: term position question from analyzer stack  for
>> WordDelimiterFilterFactory
>>
>> Aha!  I knew something must be  awry, but when I looked at the
> analysis
>> page output, well it sure looked like  it should match.  :)
>>
>> OK here is the query side WDF that finally  works, I just turned
>> everything off.  (yay)  First I tried just  completely removeing WDF
> from
>> the query side analyzer stack but that didn't  work.  So anyway I
> suppose
>> I should turn off the catenate all plus the  preserve original
> settings,
>> reindex, and see if I still get a match  huh?  (PS  thank you very
> much
>> for the help!!!)
>>
>>            <filter  class="solr.WordDelimiterFilterFactory"
>>                  generateWordParts="0"
>>                  generateNumberParts="0"
>>                  catenateWords="0"
>>                  catenateNumbers="0"
>>                  catenateAll="0"
>>                  preserveOriginal="0"
>>                  />
>>
>>
>>
>> -----Original Message-----
>> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of  Yonik
>> Seeley
>> Sent: Monday, April 25, 2011 9:24 AM
>> To: solr-user@lucene.apache.org
>> Subject:  Re: term position question from analyzer stack  for
>> WordDelimiterFilterFactory
>>
>> On Mon, Apr 25, 2011 at 12:15 PM,  Robert Petersen <ro...@buy.com>
>> wrote:
>> > The  search and index analyzer stack are the same.
>>
>> Ahhh, they should not  be!
>> Using both generate and catenate in WDF at query time is a no-no.
>> Same  reason you can't have multi-word synonyms at query time:
>>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
>> FilterFactory
>>
>> I'd  recommend going back to the WDF settings in the solr example
>> server as a  starting point.
>>
>>
>> -Yonik
>> http://www.lucenerevolution.org -- Lucene/Solr User  Conference, May
>> 25-26, San Francisco
>>
>

RE: term position question from analyzer stack for WordDelimiterFilterFactory

Posted by Robert Petersen <ro...@buy.com>.
Yeah I am about to try turning one on at a time and see what happens.  I
had a meeting so couldn't do it yet...  (darn those meetings)  (lol)


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Tuesday, April 26, 2011 2:37 PM
To: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

Hi Robert,

I'm no WDFF expert, but all these zero look suspicious:

org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
generateNumberParts=0, catenateWords=0, generateWordParts=0,
catenateAll=0, catenateNumbers=0}

A quick visit to 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel
imiterFilterFactory
 makes me think you want:

splitOnCaseChange=1  (if you want Mc Afee for some reason?)
generateWordParts=1 (if you want Mc Afee for some reason?)
preserveOriginal=1


Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Robert Petersen <ro...@buy.com>
> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
> Sent: Tue, April 26, 2011 4:39:49 PM
> Subject: RE: term position question from analyzer stack for 
>WordDelimiterFilterFactory
> 
> OK this is even more weird... everything is working much better except
> for  one thing: I was testing use cases with our top query terms to
make
> sure the  below query settings wouldn't break any existing behavior,
and
> got this most  unusual result.  The analyzer stack completely
eliminated
> the word  McAfee from the query terms!  I'm like huh?  Here is the
> analyzer  page output for that search term:
> 
> Query  Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> term  position     1
> term text     McAfee
> term  type     word
> source start,end      0,6
> payload     
> org.apache.solr.analysis.SynonymFilterFactory
> {synonyms=query_synonyms.txt,  expand=true, ignoreCase=true}
> term position     1
> term  text     McAfee
> term type     word
> source  start,end     0,6
> payload     
> org.apache.solr.analysis.StopFilterFactory  {words=stopwords.txt,
> ignoreCase=true}
> term position      1
> term text     McAfee
> term type      word
> source start,end     0,6
> payload     
> org.apache.solr.analysis.WordDelimiterFilterFactory
{preserveOriginal=0,
> generateNumberParts=0, catenateWords=0,  generateWordParts=0,
> catenateAll=0, catenateNumbers=0}
> term  position
> term text
> term type
> source  start,end
> payload
> org.apache.solr.analysis.LowerCaseFilterFactory  {}
> term position
> term text
> term type
> source  start,end
> payload
> com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
> {protected=protwords.txt}
> term  position
> term text
> term type
> source  start,end
> payload
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory  {}
> term position
> term text
> term type
> source  start,end
> payload
> 
> 
> 
> -----Original Message-----
> From: Robert  Petersen [mailto:robertpe@buy.com] 
> Sent: Monday, April 25,  2011 11:27 AM
> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
> Subject:  RE: term position question from analyzer stack  for
> WordDelimiterFilterFactory
> 
> Aha!  I knew something must be  awry, but when I looked at the
analysis
> page output, well it sure looked like  it should match.  :)
> 
> OK here is the query side WDF that finally  works, I just turned
> everything off.  (yay)  First I tried just  completely removeing WDF
from
> the query side analyzer stack but that didn't  work.  So anyway I
suppose
> I should turn off the catenate all plus the  preserve original
settings,
> reindex, and see if I still get a match  huh?  (PS  thank you very
much
> for the help!!!)
> 
>            <filter  class="solr.WordDelimiterFilterFactory"
>                  generateWordParts="0"
>                  generateNumberParts="0"
>                  catenateWords="0"
>                  catenateNumbers="0"
>                  catenateAll="0"
>                  preserveOriginal="0"
>                  />    
> 
> 
> 
> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of  Yonik
> Seeley
> Sent: Monday, April 25, 2011 9:24 AM
> To: solr-user@lucene.apache.org
> Subject:  Re: term position question from analyzer stack  for
> WordDelimiterFilterFactory
> 
> On Mon, Apr 25, 2011 at 12:15 PM,  Robert Petersen <ro...@buy.com>
> wrote:
> > The  search and index analyzer stack are the same.
> 
> Ahhh, they should not  be!
> Using both generate and catenate in WDF at query time is a no-no.
> Same  reason you can't have multi-word synonyms at query time:
>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
> FilterFactory
> 
> I'd  recommend going back to the WDF settings in the solr example
> server as a  starting point.
> 
> 
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User  Conference, May
> 25-26, San Francisco
> 

Re: term position question from analyzer stack for WordDelimiterFilterFactory

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Robert,

I'm no WDFF expert, but all these zero look suspicious:

org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
generateNumberParts=0, catenateWords=0, generateWordParts=0,
catenateAll=0, catenateNumbers=0}

A quick visit to 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
 makes me think you want:

splitOnCaseChange=1  (if you want Mc Afee for some reason?)
generateWordParts=1 (if you want Mc Afee for some reason?)
preserveOriginal=1


Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Robert Petersen <ro...@buy.com>
> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
> Sent: Tue, April 26, 2011 4:39:49 PM
> Subject: RE: term position question from analyzer stack for 
>WordDelimiterFilterFactory
> 
> OK this is even more weird... everything is working much better except
> for  one thing: I was testing use cases with our top query terms to make
> sure the  below query settings wouldn't break any existing behavior, and
> got this most  unusual result.  The analyzer stack completely eliminated
> the word  McAfee from the query terms!  I'm like huh?  Here is the
> analyzer  page output for that search term:
> 
> Query  Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> term  position     1
> term text     McAfee
> term  type     word
> source start,end      0,6
> payload     
> org.apache.solr.analysis.SynonymFilterFactory
> {synonyms=query_synonyms.txt,  expand=true, ignoreCase=true}
> term position     1
> term  text     McAfee
> term type     word
> source  start,end     0,6
> payload     
> org.apache.solr.analysis.StopFilterFactory  {words=stopwords.txt,
> ignoreCase=true}
> term position      1
> term text     McAfee
> term type      word
> source start,end     0,6
> payload     
> org.apache.solr.analysis.WordDelimiterFilterFactory  {preserveOriginal=0,
> generateNumberParts=0, catenateWords=0,  generateWordParts=0,
> catenateAll=0, catenateNumbers=0}
> term  position
> term text
> term type
> source  start,end
> payload
> org.apache.solr.analysis.LowerCaseFilterFactory  {}
> term position
> term text
> term type
> source  start,end
> payload
> com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
> {protected=protwords.txt}
> term  position
> term text
> term type
> source  start,end
> payload
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory  {}
> term position
> term text
> term type
> source  start,end
> payload
> 
> 
> 
> -----Original Message-----
> From: Robert  Petersen [mailto:robertpe@buy.com] 
> Sent: Monday, April 25,  2011 11:27 AM
> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
> Subject:  RE: term position question from analyzer stack  for
> WordDelimiterFilterFactory
> 
> Aha!  I knew something must be  awry, but when I looked at the analysis
> page output, well it sure looked like  it should match.  :)
> 
> OK here is the query side WDF that finally  works, I just turned
> everything off.  (yay)  First I tried just  completely removeing WDF from
> the query side analyzer stack but that didn't  work.  So anyway I suppose
> I should turn off the catenate all plus the  preserve original settings,
> reindex, and see if I still get a match  huh?  (PS  thank you very much
> for the help!!!)
> 
>            <filter  class="solr.WordDelimiterFilterFactory"
>                  generateWordParts="0"
>                  generateNumberParts="0"
>                  catenateWords="0"
>                  catenateNumbers="0"
>                  catenateAll="0"
>                  preserveOriginal="0"
>                  />    
> 
> 
> 
> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of  Yonik
> Seeley
> Sent: Monday, April 25, 2011 9:24 AM
> To: solr-user@lucene.apache.org
> Subject:  Re: term position question from analyzer stack  for
> WordDelimiterFilterFactory
> 
> On Mon, Apr 25, 2011 at 12:15 PM,  Robert Petersen <ro...@buy.com>
> wrote:
> > The  search and index analyzer stack are the same.
> 
> Ahhh, they should not  be!
> Using both generate and catenate in WDF at query time is a no-no.
> Same  reason you can't have multi-word synonyms at query time:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
> FilterFactory
> 
> I'd  recommend going back to the WDF settings in the solr example
> server as a  starting point.
> 
> 
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User  Conference, May
> 25-26, San Francisco
> 

RE: term position question from analyzer stack for WordDelimiterFilterFactory

Posted by Robert Petersen <ro...@buy.com>.
OK this is even more weird... everything is working much better except
for one thing: I was testing use cases with our top query terms to make
sure the below query settings wouldn't break any existing behavior, and
got this most unusual result.  The analyzer stack completely eliminated
the word McAfee from the query terms!  I'm like huh?  Here is the
analyzer page output for that search term:

Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position 	1
term text 	McAfee
term type 	word
source start,end 	0,6
payload 	
org.apache.solr.analysis.SynonymFilterFactory
{synonyms=query_synonyms.txt, expand=true, ignoreCase=true}
term position 	1
term text 	McAfee
term type 	word
source start,end 	0,6
payload 	
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position 	1
term text 	McAfee
term type 	word
source start,end 	0,6
payload 	
org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
generateNumberParts=0, catenateWords=0, generateWordParts=0,
catenateAll=0, catenateNumbers=0}
term position
term text
term type
source start,end
payload
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position
term text
term type
source start,end
payload
com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
{protected=protwords.txt}
term position
term text
term type
source start,end
payload
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position
term text
term type
source start,end
payload



-----Original Message-----
From: Robert Petersen [mailto:robertpe@buy.com] 
Sent: Monday, April 25, 2011 11:27 AM
To: solr-user@lucene.apache.org; yonik@lucidimagination.com
Subject: RE: term position question from analyzer stack for
WordDelimiterFilterFactory

Aha!  I knew something must be awry, but when I looked at the analysis
page output, well it sure looked like it should match.  :)

OK here is the query side WDF that finally works, I just turned
everything off.  (yay)  First I tried just completely removeing WDF from
the query side analyzer stack but that didn't work.  So anyway I suppose
I should turn off the catenate all plus the preserve original settings,
reindex, and see if I still get a match huh?  (PS  thank you very much
for the help!!!)

          <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="0"
                generateNumberParts="0"
                catenateWords="0"
                catenateNumbers="0"
                catenateAll="0"
                preserveOriginal="0"
                />	



-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
Seeley
Sent: Monday, April 25, 2011 9:24 AM
To: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen <ro...@buy.com>
wrote:
> The search and index analyzer stack are the same.

Ahhh, they should not be!
Using both generate and catenate in WDF at query time is a no-no.
Same reason you can't have multi-word synonyms at query time:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
FilterFactory

I'd recommend going back to the WDF settings in the solr example
server as a starting point.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

RE: term position question from analyzer stack for WordDelimiterFilterFactory

Posted by Robert Petersen <ro...@buy.com>.
Aha!  I knew something must be awry, but when I looked at the analysis
page output, well it sure looked like it should match.  :)

OK here is the query side WDF that finally works, I just turned
everything off.  (yay)  First I tried just completely removeing WDF from
the query side analyzer stack but that didn't work.  So anyway I suppose
I should turn off the catenate all plus the preserve original settings,
reindex, and see if I still get a match huh?  (PS  thank you very much
for the help!!!)

          <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="0"
                generateNumberParts="0"
                catenateWords="0"
                catenateNumbers="0"
                catenateAll="0"
                preserveOriginal="0"
                />	



-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
Seeley
Sent: Monday, April 25, 2011 9:24 AM
To: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen <ro...@buy.com>
wrote:
> The search and index analyzer stack are the same.

Ahhh, they should not be!
Using both generate and catenate in WDF at query time is a no-no.
Same reason you can't have multi-word synonyms at query time:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
FilterFactory

I'd recommend going back to the WDF settings in the solr example
server as a starting point.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: term position question from analyzer stack for WordDelimiterFilterFactory

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen <ro...@buy.com> wrote:
> The search and index analyzer stack are the same.

Ahhh, they should not be!
Using both generate and catenate in WDF at query time is a no-no.
Same reason you can't have multi-word synonyms at query time:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

I'd recommend going back to the WDF settings in the solr example
server as a starting point.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

RE: term position question from analyzer stack for WordDelimiterFilterFactory

Posted by Robert Petersen <ro...@buy.com>.
Sorry, that was supposed to be just another way to say the same thing...
OK look here is my current situation.  Even with preserveOriginal and
concatAll set, I am still getting an even odder result.

I set up sku=218078624 with title=" Beanbag AppleTV Friction Dash Mount
for GPS " and index it in dev.

The search and index analyzer stack are the same.  When I do this search
in the solr admin page I get zero results " sku:218078624  title:AppleTV
" but when I do this search I get one result " sku:218078624
title:appletv ".  This is the opposite of what was happening before I
added the preserve original setting.  In the analysis page I plug in
that title and term, and it looks to me like it should match... which is
why I started asking about term positions and such.  I don't understand
why I don't get a hit in both cases.  It is so weird.



-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
Seeley
Sent: Friday, April 22, 2011 5:55 PM
To: Robert Petersen
Cc: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

On Fri, Apr 22, 2011 at 8:24 PM, Robert Petersen <ro...@buy.com>
wrote:
> I can repeatedly demonstrate this in my dev environment, where I get
> entirely different results searching for AppleTV vs. appletv

You originally said "I cannot get a match between AppleTV on the
indexing side and appletv on the search side".
Getting different numbers of results or different results is slightly
different.

For example, if there were a document with "Apple TV" in it, then a
query of "AppleTV" would match that doc, but a query of "appletv"
would not.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: term position question from analyzer stack for WordDelimiterFilterFactory

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Apr 22, 2011 at 8:24 PM, Robert Petersen <ro...@buy.com> wrote:
> I can repeatedly demonstrate this in my dev environment, where I get
> entirely different results searching for AppleTV vs. appletv

You originally said "I cannot get a match between AppleTV on the
indexing side and appletv on the search side".
Getting different numbers of results or different results is slightly different.

For example, if there were a document with "Apple TV" in it, then a
query of "AppleTV" would match that doc, but a query of "appletv"
would not.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

RE: term position question from analyzer stack for WordDelimiterFilterFactory

Posted by Robert Petersen <ro...@buy.com>.
I can repeatedly demonstrate this in my dev environment, where I get
entirely different results searching for AppleTV vs. appletv and I
really just don't get it.  I set up a specific sku in dev with AppleTV
in its title to experiment with.  What can I provide to help diagnose?
I need to make this work...  thanks for the help!


-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
Seeley
Sent: Thursday, April 21, 2011 5:54 PM
To: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

On Thu, Apr 21, 2011 at 8:06 PM, Robert Petersen <ro...@buy.com>
wrote:
> So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory
settings I cannot get a match between AppleTV on the indexing side and
appletv on the search side.

Hmmm, that shouldn't be the case.  The "text" field in the solr
example config doesn't use preserveOriginal, and AppleTV is indexed as

appl, tv/appletv

And a search for appletv does match fine.

Perhaps on the search side there is actually a phrase query like "big
appletv"?  One workaround for that is to add a little slop... "big
appletv"~1

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: term position question from analyzer stack for WordDelimiterFilterFactory

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Thu, Apr 21, 2011 at 8:06 PM, Robert Petersen <ro...@buy.com> wrote:
> So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory settings I cannot get a match between AppleTV on the indexing side and appletv on the search side.

Hmmm, that shouldn't be the case.  The "text" field in the solr
example config doesn't use preserveOriginal, and AppleTV is indexed as

appl, tv/appletv

And a search for appletv does match fine.

Perhaps on the search side there is actually a phrase query like "big
appletv"?  One workaround for that is to add a little slop... "big
appletv"~1

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco