You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by jo...@aol.com on 2010/08/02 20:17:57 UTC

Phrase search

Hi All,
 
I don't understand why i'm getting this behavior.  I was under the impression if I search for "Apple 2" (with quotes and space before “2”) it will give me different results vs. if I search for "Apple2" (with quotes and no space before “2”), but I'm not!  Why? 
 
Here is my fieldType setting from my schema.xml:

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>
 
What I am missing?!!  What part of my solr.WordDelimiterFilterFactory need to change (if that’s where the issue is)?
 
I’m using Solr 1.2
 
Thanks in advanced.
 
-M

Re: Phrase search

Posted by Chris Hostetter <ho...@fucit.org>.

: I'm trying to match "Apple 2" but not "Apple2" using phrase search, this is why I have it quoted.

: I was under the impression --when I use phrase search-- all the
: analyzer magic would not apply, but it is!!! Otherwise, how would I
: search for a phrase?!

well .. yes ... even with phrase searches your query is analyzed.

the only differnce is that with a quoted phrase search, the entire phrase
is analyzed at one time -- when the input isn't quoted, the whitespace is
evaluated by the QueryParser as markup just like quotes and +/-,
etc... (unless it's escaped) and the individual words are analyzed
independently.

: Using Google, when I search for "Windows 7" (with quotes), unlike Solr,
: I don't get hits on "Window7". I want to use catenateNumbers="1" which
: I want it to take effect on other searches but no phrase searches. Is
: this possible ?

you need to elaborate more on what you do and don't want to match -- so
far you've given one example of a query you want to execute, and a
document you *don't* want to match that query, but not an example of what
types of documents you *do* want to match that query -- you also haven't
given examples of queries that you *do* want that example document to
match.

i suspect that catenateNumbers="1" isn't actually your problem ... it
sounds like you don't actually want WordDelimiterFilter doing the "split"
at index time at all.

Forget the phrase queries for a second: the question to ask yourself is:
when you index a document containing "Windows7" do you want a serach for
the word Windows to match thta document?

If the answer is "no" then you probably don't want WordDelimiterFilter at
all.

-Hoss

Re: Phrase search

Posted by jo...@aol.com.


I'm trying to match "Apple 2" but not "Apple2" using phrase search, this is why I have it quoted.
 
I was under the impression --when I use phrase search-- all the analyzer magic would not apply, but it is!!!  Otherwise, how would I search for a phrase?!
 
Using Google, when I search for "Windows 7" (with quotes), unlike Solr, I don't get hits on "Window7".  I want to use catenateNumbers="1" which I want it to take effect on other searches but no phrase searches.  Is this possible ?
 
Yes, we are in the process of planning to upgrade to Solr 1.4.1 -- it takes time and a lot of effort to do such an upgrade at where I work.
 
Thank you for your help and understanding.
 
-M






-----Original Message-----
From: Chris Hostetter <ho...@fucit.org>
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 5:41 pm
Subject: Re: Phrase search



 I don't understand why i'm getting this behavior.  I was under the 
 impression if I search for "Apple 2" (with quotes and space before “2”) 
 it will give me different results vs. if I search for "Apple2" (with 
 quotes and no space before “2”), but I'm not!  Why?
if you search "Apple 2" in quotes, then the analyzer for your field gets 
he full string (with the space) and whatever it does with it and whatever 
erms it produces determs what Query gets executed.  If you search 
Apple2" (w/ or w/o quotes) then the analyzer for your field gets the full 
tring and whatever it does with it and whatever Terms it produces determs 
hat Query gets executed.
None of that changes based on the analyzer you use.
With that in mind: i relaly don't understand your question.  Let's step 
ack and instead of trying to explain *why* you are getting the results 
ou are getting (short answer: because that's how your analyzer works) 
et's ask the quetsion: what do you *want* to do?  What do you *want* to 
ee happen when you enter various query strings?
http://people.apache.org/~hossman/#xyproblem
Y Problem
Your question appears to be an "XY Problem" ... that is: you are dealing
ith "X", you are assuming "Y" will help you, and you are asking about "Y"
ithout giving more details about the "X" so that we can understand the
ull issue.  Perhaps the best solution doesn't involve "Y" at all?
ee Also: http://www.perlmonks.org/index.pl?node_id=542341
: I’m using Solr 1.2
PS: Solr 1.2 had numerous bugs which were really really bad and which were 
ixed in Solr 1.3.  Solr 1.3 had numerous bugs where were really really 
ad and were fixed in Solr 1.4.  Solr 1.4 had a couple of bugs where 
eally really bad and which were fixed in Solr 1.4.1 ... so even if you 
on't want any of hte new features, you should *REALLY* consider 
pgrading.

Hoss

Re: Phrase search

Posted by Chris Hostetter <ho...@fucit.org>.

: I don't understand why i'm getting this behavior.  I was under the 
: impression if I search for "Apple 2" (with quotes and space before “2”) 
: it will give me different results vs. if I search for "Apple2" (with 
: quotes and no space before “2”), but I'm not!  Why?

if you search "Apple 2" in quotes, then the analyzer for your field gets 
the full string (with the space) and whatever it does with it and whatever 
Terms it produces determs what Query gets executed.  If you search 
"Apple2" (w/ or w/o quotes) then the analyzer for your field gets the full 
string and whatever it does with it and whatever Terms it produces determs 
what Query gets executed.

None of that changes based on the analyzer you use.

With that in mind: i relaly don't understand your question.  Let's step 
back and instead of trying to explain *why* you are getting the results 
you are getting (short answer: because that's how your analyzer works) 
let's ask the quetsion: what do you *want* to do?  What do you *want* to 
see happen when you enter various query strings?

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341

: I’m using Solr 1.2

PS: Solr 1.2 had numerous bugs which were really really bad and which were 
fixed in Solr 1.3.  Solr 1.3 had numerous bugs where were really really 
bad and were fixed in Solr 1.4.  Solr 1.4 had a couple of bugs where 
really really bad and which were fixed in Solr 1.4.1 ... so even if you 
don't want any of hte new features, you should *REALLY* consider 
upgrading.


-Hoss

Re: how to highlight string in jsp

Posted by Chris Hostetter <ho...@fucit.org>.

: Subject: how to highlight string in jsp
: References: <8C...@webmail-m057.sysops.aol.com>
:  <vm...@animal.buyways.nl>
: In-Reply-To: <vm...@animal.buyways.nl>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking




-Hoss

how to highlight string in jsp

Posted by "Ma, Xiaohui (NIH/NLM/LHC) [C]" <xi...@mail.nlm.nih.gov>.

Hello,

I am trying to display the highlight string in different color on jsp. I use following in servlet.

query.setHighlight(true).setHighlightSnippets(1);
query.setParam("hl.fl", "Abstract");

I wonder how I can display it in jsp

Thanks in advanced.
xm

Re: Phrase search

Posted by jo...@aol.com.

I'm using Solr 1.2, so I don't have splitOnNumerics.  Reading that URL, is my use of catenateNumbers="1" causing this?  Should I set it to "0" vs. "1" as I have it now?
 
-M




-----Original Message-----
From: Markus Jelsma <ma...@buyways.nl>
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 3:54 pm
Subject: RE: Re: Phrase search


Hi,
 
Queries on an analyzed field will need to be analyzed as well or it might not 
atch. You can configure the WordDelimiterFilterFactory so it will not split 
nto multiple tokens because of numerics, see the splitOnNumerics parameter [1].
 
[1]: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
 
Cheers,


----Original message-----
rom: johnmunir@aol.com
ent: Mon 02-08-2010 21:29
o: solr-user@lucene.apache.org; 
ubject: Re: Phrase search


Thanks for the quick response.
Which part of my WordDelimiterFilterFactory is changing "Apple 2" to "Apple2"? 
How do I fix it?  Also, I'm really confused about this.  I was under the 
mpression a phrase search is not impacted by the analyzer, no?
-M

----Original Message-----
rom: Markus Jelsma <ma...@buyways.nl>
o: solr-user@lucene.apache.org
ent: Mon, Aug 2, 2010 2:27 pm
ubject: RE: Phrase search

ell, the WordDelimiterFilterFactory in your query analyzer clearly makes "Apple 
" out of "Apple2", that's what it's for. If you're looking for an exact match, 
e a string field. Check the output with the debugQuery=true parameter.
Cheers, 
----Original message-----
om: johnmunir@aol.com
nt: Mon 02-08-2010 20:18
: solr-user@lucene.apache.org; 
bject: Phrase search
i All,
 don't understand why i'm getting this behavior.  I was under the impression if 
search for "Apple 2" (with quotes and space before 2 ) it will give me 
fferent results vs. if I search for "Apple2" (with quotes and no space before 
, but I'm not!  Why? 
ere is my fieldType setting from my schema.xml:
 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" 
noreCase="true" expand="false"/>
    -->
    <filter class="solr.StopFilterFactory" ignoreCase="true" 
rds="stopwords.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
nerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
noreCase="true" expand="true"/> -->
    <filter class="solr.StopFilterFactory" ignoreCase="true" 
rds="stopwords.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
nerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>
hat I am missing?!!  What part of my solr.WordDelimiterFilterFactory need to 
ange (if that s where the issue is)?
 m using Solr 1.2
hanks in advanced.
M

RE: Re: Phrase search

Posted by Markus Jelsma <ma...@buyways.nl>.

Hi,

 

Queries on an analyzed field will need to be analyzed as well or it might not match. You can configure the WordDelimiterFilterFactory so it will not split into multiple tokens because of numerics, see the splitOnNumerics parameter [1].

 

[1]: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

 

Cheers,


 
-----Original message-----
From: johnmunir@aol.com
Sent: Mon 02-08-2010 21:29
To: solr-user@lucene.apache.org; 
Subject: Re: Phrase search





Thanks for the quick response.

Which part of my WordDelimiterFilterFactory is changing "Apple 2" to "Apple2"?  How do I fix it?  Also, I'm really confused about this.  I was under the impression a phrase search is not impacted by the analyzer, no?

-M


-----Original Message-----
From: Markus Jelsma <ma...@buyways.nl>
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 2:27 pm
Subject: RE: Phrase search


Well, the WordDelimiterFilterFactory in your query analyzer clearly makes "Apple 
" out of "Apple2", that's what it's for. If you're looking for an exact match, 
se a string field. Check the output with the debugQuery=true parameter.

Cheers, 

----Original message-----
rom: johnmunir@aol.com
ent: Mon 02-08-2010 20:18
o: solr-user@lucene.apache.org; 
ubject: Phrase search

i All,
I don't understand why i'm getting this behavior.  I was under the impression if 
search for "Apple 2" (with quotes and space before 2 ) it will give me 
ifferent results vs. if I search for "Apple2" (with quotes and no space before 
), but I'm not!  Why? 
Here is my fieldType setting from my schema.xml:
  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
   <analyzer type="index">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <!-- in this example, we will only use synonyms at query time
     <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" 
gnoreCase="true" expand="false"/>
     -->
     <filter class="solr.StopFilterFactory" ignoreCase="true" 
ords="stopwords.txt"/>
     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
enerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="query">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
gnoreCase="true" expand="true"/> -->
     <filter class="solr.StopFilterFactory" ignoreCase="true" 
ords="stopwords.txt"/>
     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
enerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
 </fieldType>
What I am missing?!!  What part of my solr.WordDelimiterFilterFactory need to 
hange (if that s where the issue is)?
I m using Solr 1.2
Thanks in advanced.
-M

Re: Phrase search

Posted by jo...@aol.com.




Thanks for the quick response.

Which part of my WordDelimiterFilterFactory is changing "Apple 2" to "Apple2"?  How do I fix it?  Also, I'm really confused about this.  I was under the impression a phrase search is not impacted by the analyzer, no?

-M


-----Original Message-----
From: Markus Jelsma <ma...@buyways.nl>
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 2:27 pm
Subject: RE: Phrase search


Well, the WordDelimiterFilterFactory in your query analyzer clearly makes "Apple 
" out of "Apple2", that's what it's for. If you're looking for an exact match, 
se a string field. Check the output with the debugQuery=true parameter.
 
Cheers, 

----Original message-----
rom: johnmunir@aol.com
ent: Mon 02-08-2010 20:18
o: solr-user@lucene.apache.org; 
ubject: Phrase search

i All,
I don't understand why i'm getting this behavior.  I was under the impression if 
 search for "Apple 2" (with quotes and space before 2 ) it will give me 
ifferent results vs. if I search for "Apple2" (with quotes and no space before 
 ), but I'm not!  Why? 
Here is my fieldType setting from my schema.xml:
   <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <!-- in this example, we will only use synonyms at query time
      <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" 
gnoreCase="true" expand="false"/>
      -->
      <filter class="solr.StopFilterFactory" ignoreCase="true" 
ords="stopwords.txt"/>
      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
enerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
gnoreCase="true" expand="true"/> -->
      <filter class="solr.StopFilterFactory" ignoreCase="true" 
ords="stopwords.txt"/>
      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
enerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
  </fieldType>
What I am missing?!!  What part of my solr.WordDelimiterFilterFactory need to 
hange (if that s where the issue is)?
I m using Solr 1.2
Thanks in advanced.
-M

RE: Phrase search

Posted by Markus Jelsma <ma...@buyways.nl>.

Well, the WordDelimiterFilterFactory in your query analyzer clearly makes "Apple 2" out of "Apple2", that's what it's for. If you're looking for an exact match, use a string field. Check the output with the debugQuery=true parameter.

 

Cheers, 
 
-----Original message-----
From: johnmunir@aol.com
Sent: Mon 02-08-2010 20:18
To: solr-user@lucene.apache.org; 
Subject: Phrase search


Hi All,

I don't understand why i'm getting this behavior.  I was under the impression if I search for "Apple 2" (with quotes and space before 2 ) it will give me different results vs. if I search for "Apple2" (with quotes and no space before 2 ), but I'm not!  Why? 

Here is my fieldType setting from my schema.xml:

   <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <!-- in this example, we will only use synonyms at query time
       <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
       -->
       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
       <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> -->
       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
       <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
   </fieldType>

What I am missing?!!  What part of my solr.WordDelimiterFilterFactory need to change (if that s where the issue is)?

I m using Solr 1.2

Thanks in advanced.

-M