You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jo...@aol.com on 2010/08/02 20:17:57 UTC
Phrase search
Hi All,
I don't understand why i'm getting this behavior. I was under the impression if I search for "Apple 2" (with quotes and space before “2”) it will give me different results vs. if I search for "Apple2" (with quotes and no space before “2”), but I'm not! Why?
Here is my fieldType setting from my schema.xml:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> -->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
What I am missing?!! What part of my solr.WordDelimiterFilterFactory need to change (if that’s where the issue is)?
I’m using Solr 1.2
Thanks in advanced.
-M
Re: Phrase search
Posted by Chris Hostetter <ho...@fucit.org>.
: I'm trying to match "Apple 2" but not "Apple2" using phrase search, this is why I have it quoted.
: I was under the impression --when I use phrase search-- all the
: analyzer magic would not apply, but it is!!! Otherwise, how would I
: search for a phrase?!
well .. yes ... even with phrase searches your query is analyzed.
the only differnce is that with a quoted phrase search, the entire phrase
is analyzed at one time -- when the input isn't quoted, the whitespace is
evaluated by the QueryParser as markup just like quotes and +/-,
etc... (unless it's escaped) and the individual words are analyzed
independently.
: Using Google, when I search for "Windows 7" (with quotes), unlike Solr,
: I don't get hits on "Window7". I want to use catenateNumbers="1" which
: I want it to take effect on other searches but no phrase searches. Is
: this possible ?
you need to elaborate more on what you do and don't want to match -- so
far you've given one example of a query you want to execute, and a
document you *don't* want to match that query, but not an example of what
types of documents you *do* want to match that query -- you also haven't
given examples of queries that you *do* want that example document to
match.
i suspect that catenateNumbers="1" isn't actually your problem ... it
sounds like you don't actually want WordDelimiterFilter doing the "split"
at index time at all.
Forget the phrase queries for a second: the question to ask yourself is:
when you index a document containing "Windows7" do you want a serach for
the word Windows to match thta document?
If the answer is "no" then you probably don't want WordDelimiterFilter at
all.
-Hoss
Re: Phrase search
Posted by jo...@aol.com.
I'm trying to match "Apple 2" but not "Apple2" using phrase search, this is why I have it quoted.
I was under the impression --when I use phrase search-- all the analyzer magic would not apply, but it is!!! Otherwise, how would I search for a phrase?!
Using Google, when I search for "Windows 7" (with quotes), unlike Solr, I don't get hits on "Window7". I want to use catenateNumbers="1" which I want it to take effect on other searches but no phrase searches. Is this possible ?
Yes, we are in the process of planning to upgrade to Solr 1.4.1 -- it takes time and a lot of effort to do such an upgrade at where I work.
Thank you for your help and understanding.
-M
-----Original Message-----
From: Chris Hostetter <ho...@fucit.org>
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 5:41 pm
Subject: Re: Phrase search
I don't understand why i'm getting this behavior. I was under the
impression if I search for "Apple 2" (with quotes and space before “2”)
it will give me different results vs. if I search for "Apple2" (with
quotes and no space before “2”), but I'm not! Why?
if you search "Apple 2" in quotes, then the analyzer for your field gets
he full string (with the space) and whatever it does with it and whatever
erms it produces determs what Query gets executed. If you search
Apple2" (w/ or w/o quotes) then the analyzer for your field gets the full
tring and whatever it does with it and whatever Terms it produces determs
hat Query gets executed.
None of that changes based on the analyzer you use.
With that in mind: i relaly don't understand your question. Let's step
ack and instead of trying to explain *why* you are getting the results
ou are getting (short answer: because that's how your analyzer works)
et's ask the quetsion: what do you *want* to do? What do you *want* to
ee happen when you enter various query strings?
http://people.apache.org/~hossman/#xyproblem
Y Problem
Your question appears to be an "XY Problem" ... that is: you are dealing
ith "X", you are assuming "Y" will help you, and you are asking about "Y"
ithout giving more details about the "X" so that we can understand the
ull issue. Perhaps the best solution doesn't involve "Y" at all?
ee Also: http://www.perlmonks.org/index.pl?node_id=542341
: I’m using Solr 1.2
PS: Solr 1.2 had numerous bugs which were really really bad and which were
ixed in Solr 1.3. Solr 1.3 had numerous bugs where were really really
ad and were fixed in Solr 1.4. Solr 1.4 had a couple of bugs where
eally really bad and which were fixed in Solr 1.4.1 ... so even if you
on't want any of hte new features, you should *REALLY* consider
pgrading.
Hoss
Re: Phrase search
Posted by Chris Hostetter <ho...@fucit.org>.
: I don't understand why i'm getting this behavior. I was under the
: impression if I search for "Apple 2" (with quotes and space before “2”)
: it will give me different results vs. if I search for "Apple2" (with
: quotes and no space before “2”), but I'm not! Why?
if you search "Apple 2" in quotes, then the analyzer for your field gets
the full string (with the space) and whatever it does with it and whatever
Terms it produces determs what Query gets executed. If you search
"Apple2" (w/ or w/o quotes) then the analyzer for your field gets the full
string and whatever it does with it and whatever Terms it produces determs
what Query gets executed.
None of that changes based on the analyzer you use.
With that in mind: i relaly don't understand your question. Let's step
back and instead of trying to explain *why* you are getting the results
you are getting (short answer: because that's how your analyzer works)
let's ask the quetsion: what do you *want* to do? What do you *want* to
see happen when you enter various query strings?
http://people.apache.org/~hossman/#xyproblem
XY Problem
Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue. Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341
: I’m using Solr 1.2
PS: Solr 1.2 had numerous bugs which were really really bad and which were
fixed in Solr 1.3. Solr 1.3 had numerous bugs where were really really
bad and were fixed in Solr 1.4. Solr 1.4 had a couple of bugs where
really really bad and which were fixed in Solr 1.4.1 ... so even if you
don't want any of hte new features, you should *REALLY* consider
upgrading.
-Hoss
Re: how to highlight string in jsp
Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: how to highlight string in jsp
: References: <8C...@webmail-m057.sysops.aol.com>
: <vm...@animal.buyways.nl>
: In-Reply-To: <vm...@animal.buyways.nl>
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention. It makes following discussions in the mailing list archives
particularly difficult.
See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
-Hoss
how to highlight string in jsp
Posted by "Ma, Xiaohui (NIH/NLM/LHC) [C]" <xi...@mail.nlm.nih.gov>.
Hello,
I am trying to display the highlight string in different color on jsp. I use following in servlet.
query.setHighlight(true).setHighlightSnippets(1);
query.setParam("hl.fl", "Abstract");
I wonder how I can display it in jsp
Thanks in advanced.
xm
Re: Phrase search
Posted by jo...@aol.com.
I'm using Solr 1.2, so I don't have splitOnNumerics. Reading that URL, is my use of catenateNumbers="1" causing this? Should I set it to "0" vs. "1" as I have it now?
-M
-----Original Message-----
From: Markus Jelsma <ma...@buyways.nl>
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 3:54 pm
Subject: RE: Re: Phrase search
Hi,
Queries on an analyzed field will need to be analyzed as well or it might not
atch. You can configure the WordDelimiterFilterFactory so it will not split
nto multiple tokens because of numerics, see the splitOnNumerics parameter [1].
[1]: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
Cheers,
----Original message-----
rom: johnmunir@aol.com
ent: Mon 02-08-2010 21:29
o: solr-user@lucene.apache.org;
ubject: Re: Phrase search
Thanks for the quick response.
Which part of my WordDelimiterFilterFactory is changing "Apple 2" to "Apple2"?
How do I fix it? Also, I'm really confused about this. I was under the
mpression a phrase search is not impacted by the analyzer, no?
-M
----Original Message-----
rom: Markus Jelsma <ma...@buyways.nl>
o: solr-user@lucene.apache.org
ent: Mon, Aug 2, 2010 2:27 pm
ubject: RE: Phrase search
ell, the WordDelimiterFilterFactory in your query analyzer clearly makes "Apple
" out of "Apple2", that's what it's for. If you're looking for an exact match,
e a string field. Check the output with the debugQuery=true parameter.
Cheers,
----Original message-----
om: johnmunir@aol.com
nt: Mon 02-08-2010 20:18
: solr-user@lucene.apache.org;
bject: Phrase search
i All,
don't understand why i'm getting this behavior. I was under the impression if
search for "Apple 2" (with quotes and space before 2 ) it will give me
fferent results vs. if I search for "Apple2" (with quotes and no space before
, but I'm not! Why?
ere is my fieldType setting from my schema.xml:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
noreCase="true" expand="false"/>
-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
rds="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
nerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
noreCase="true" expand="true"/> -->
<filter class="solr.StopFilterFactory" ignoreCase="true"
rds="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
nerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
hat I am missing?!! What part of my solr.WordDelimiterFilterFactory need to
ange (if that s where the issue is)?
m using Solr 1.2
hanks in advanced.
M
RE: Re: Phrase search
Posted by Markus Jelsma <ma...@buyways.nl>.
Hi,
Queries on an analyzed field will need to be analyzed as well or it might not match. You can configure the WordDelimiterFilterFactory so it will not split into multiple tokens because of numerics, see the splitOnNumerics parameter [1].
[1]: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
Cheers,
-----Original message-----
From: johnmunir@aol.com
Sent: Mon 02-08-2010 21:29
To: solr-user@lucene.apache.org;
Subject: Re: Phrase search
Thanks for the quick response.
Which part of my WordDelimiterFilterFactory is changing "Apple 2" to "Apple2"? How do I fix it? Also, I'm really confused about this. I was under the impression a phrase search is not impacted by the analyzer, no?
-M
-----Original Message-----
From: Markus Jelsma <ma...@buyways.nl>
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 2:27 pm
Subject: RE: Phrase search
Well, the WordDelimiterFilterFactory in your query analyzer clearly makes "Apple
" out of "Apple2", that's what it's for. If you're looking for an exact match,
se a string field. Check the output with the debugQuery=true parameter.
Cheers,
----Original message-----
rom: johnmunir@aol.com
ent: Mon 02-08-2010 20:18
o: solr-user@lucene.apache.org;
ubject: Phrase search
i All,
I don't understand why i'm getting this behavior. I was under the impression if
search for "Apple 2" (with quotes and space before 2 ) it will give me
ifferent results vs. if I search for "Apple2" (with quotes and no space before
), but I'm not! Why?
Here is my fieldType setting from my schema.xml:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
gnoreCase="true" expand="false"/>
-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
ords="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
enerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
gnoreCase="true" expand="true"/> -->
<filter class="solr.StopFilterFactory" ignoreCase="true"
ords="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
enerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
What I am missing?!! What part of my solr.WordDelimiterFilterFactory need to
hange (if that s where the issue is)?
I m using Solr 1.2
Thanks in advanced.
-M
Re: Phrase search
Posted by jo...@aol.com.
Thanks for the quick response.
Which part of my WordDelimiterFilterFactory is changing "Apple 2" to "Apple2"? How do I fix it? Also, I'm really confused about this. I was under the impression a phrase search is not impacted by the analyzer, no?
-M
-----Original Message-----
From: Markus Jelsma <ma...@buyways.nl>
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 2:27 pm
Subject: RE: Phrase search
Well, the WordDelimiterFilterFactory in your query analyzer clearly makes "Apple
" out of "Apple2", that's what it's for. If you're looking for an exact match,
se a string field. Check the output with the debugQuery=true parameter.
Cheers,
----Original message-----
rom: johnmunir@aol.com
ent: Mon 02-08-2010 20:18
o: solr-user@lucene.apache.org;
ubject: Phrase search
i All,
I don't understand why i'm getting this behavior. I was under the impression if
search for "Apple 2" (with quotes and space before 2 ) it will give me
ifferent results vs. if I search for "Apple2" (with quotes and no space before
), but I'm not! Why?
Here is my fieldType setting from my schema.xml:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
gnoreCase="true" expand="false"/>
-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
ords="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
enerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
gnoreCase="true" expand="true"/> -->
<filter class="solr.StopFilterFactory" ignoreCase="true"
ords="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
enerateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
What I am missing?!! What part of my solr.WordDelimiterFilterFactory need to
hange (if that s where the issue is)?
I m using Solr 1.2
Thanks in advanced.
-M
RE: Phrase search
Posted by Markus Jelsma <ma...@buyways.nl>.
Well, the WordDelimiterFilterFactory in your query analyzer clearly makes "Apple 2" out of "Apple2", that's what it's for. If you're looking for an exact match, use a string field. Check the output with the debugQuery=true parameter.
Cheers,
-----Original message-----
From: johnmunir@aol.com
Sent: Mon 02-08-2010 20:18
To: solr-user@lucene.apache.org;
Subject: Phrase search
Hi All,
I don't understand why i'm getting this behavior. I was under the impression if I search for "Apple 2" (with quotes and space before 2 ) it will give me different results vs. if I search for "Apple2" (with quotes and no space before 2 ), but I'm not! Why?
Here is my fieldType setting from my schema.xml:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> -->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
What I am missing?!! What part of my solr.WordDelimiterFilterFactory need to change (if that s where the issue is)?
I m using Solr 1.2
Thanks in advanced.
-M