You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marian Steinbach <ma...@sendung.de> on 2012/02/03 17:11:17 UTC
Zero Matches Weirdness
Hi!
I am having a weird issue with a search string not producing a match
where it should. I can reproduce it with both 3.4 and 3.5.
"Where it should" means that I am getting a hit in the "Analyse" tool
in the admin panel, but not in a query via /select.
Now when I try
select?q=Am+Heidstamm&...
I get zero results back. But, when I quote the string
select?q=%22Am+Heidstamm%22&...
I get several hits.
BTW, the token "am" is filtered out in the field text, since it's in a
stopword list.
Any ideas on how this can b explained?
My defaultSearchField ist "text". The field gets its content via
several copyField statements.
The configuration for text is as follows:
<field name="text" type="text_de" indexed="true" stored="false"
multiValued="true" />
The configuration for type text_de is this:
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<!-- protect slashes from tokenizer by replacing with something unique -->
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="([A-Z]+)/([0-9]+)/([0-9]+)" replacement="$1ḧ$2ḧ$3" />
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="([0-9]+)/([0-9]+)" replacement="$1ḧ$2" />
<!-- protect paragraph symbol from tokenizer -->
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="§\s*([0-9]+)" replacement="ǚ$1" />
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1" preserveOriginal="1"
splitOnCaseChange="1"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_de.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.GermanMinimalStemFilterFactory" />
<!-- get slashes back in -->
<filter class="solr.PatternReplaceFilterFactory" pattern="ḧ"
replacement="/" />
<!-- get paragraph symbols back in -->
<filter class="solr.PatternReplaceFilterFactory" pattern="ǚ"
replacement="§" />
</analyzer>
</fieldType>
Log output for the unquoted phrase:
INFO: [] webapp=/solr path=/select
params={facet=true&sort=score+desc&fl=sitzung,gremium,betreff,datum,timestamp,score,aktenzeichen,typ,id,anhang&debugQuery=true&start=0&q=Am+Heidstamm&hl.fl=betreff&wt=json&fq=&hl=true&rows=10}
hits=0 status=0 QTime=29
... and for the quoted one:
INFO: [] webapp=/solr path=/select
params={facet=true&sort=score+desc&fl=sitzung,gremium,betreff,datum,timestamp,score,aktenzeichen,typ,id,anhang&start=0&q="Am+Heidstamm"&hl.fl=betreff&wt=standard&fq=&hl=true&rows=10&version=2.2}
hits=14 status=0 QTime=244
Thanks!
Re: Zero Matches Weirdness
Posted by Dmitry Kan <dm...@gmail.com>.
Ok, thanks, Erick, good to know. Sorry for the confusion.
On Fri, Feb 3, 2012 at 9:42 PM, Erik Hatcher <er...@gmail.com> wrote:
> No, don't do that. That's definitely not good advice. If the analysis
> chain is the same for both index and query, just use <analyzer>.
>
> As for Marian's issue... was there literally a + in the query or was that
> urlencoded? Try debugQuery=true for both queries and see what you get for
> the query parsing output.
>
> Erik
>
> On Feb 3, 2012, at 14:18 , Dmitry Kan wrote:
>
> > Actually, I wouldn't count on it and just specify index and query sides
> > explicitly. Just to play it safe.
> >
> > On Fri, Feb 3, 2012 at 8:34 PM, Marian Steinbach <ma...@sendung.de>
> wrote:
> >
> >> 2012/2/3 Dmitry Kan <dm...@gmail.com>:
> >>> What about <query> side of the field?
> >>>
> >>
> >> It's identical. At least that's what I think, since I din't specify
> >> the type="query" or type="index" attribute for the analyzer part.
> >>
> >> Marian
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
>
>
--
Regards,
Dmitry Kan
Re: Zero Matches Weirdness
Posted by Marian Steinbach <ma...@sendung.de>.
It just got rid of the one field "aktenzeichen" never matching in the
qf string. Now it works fine. Solved for now.
Thanks!
Re: Zero Matches Weirdness
Posted by Marian Steinbach <ma...@sendung.de>.
2012/2/3 Erik Hatcher <er...@gmail.com>:
> As for Marian's issue... was there literally a + in the query or was that urlencoded? Try debugQuery=true for both queries and see what you get for the query parsing output.
>
I tested both + and %20 with and without quotes, it doesn't make a
difference whether I use + or %20.
Here is the debug output for the unquoted version (zero hits):
debug: {
rawquerystring: "Am Heidstamm",
querystring: "Am Heidstamm",
parsedquery: "+((DisjunctionMaxQuery((aktenzeichen:Am^10.0))
DisjunctionMaxQuery((text:heidstamm^0.1 | betreff:heidstamm^3.0 |
aktenzeichen:Heidstamm^10.0)))~2)",
parsedquery_toString: "+(((aktenzeichen:Am^10.0)
(text:heidstamm^0.1 | betreff:heidstamm^3.0 |
aktenzeichen:Heidstamm^10.0))~2)",
QParser: "ExtendedDismaxQParser",
}
And for the quoted version (with hits):
{
rawquerystring: ""Am Heidstamm"",
querystring: ""Am Heidstamm"",
parsedquery: "+DisjunctionMaxQuery((text:heidstamm^0.1 |
betreff:heidstamm^3.0 | aktenzeichen:Am Heidstamm^10.0))",
parsedquery_toString: "+(text:heidstamm^0.1 | betreff:heidstamm^3.0
| aktenzeichen:Am Heidstamm^10.0)",
explain: { },
QParser: "ExtendedDismaxQParser",
}
As it seems to me, the "+(((aktenzeichen:Am^10.0) (text:heidstamm^0.1
| betreff:heidstamm^3.0 | aktenzeichen:Heidstamm^10.0))~2)" condition
cannot be fulfilled. I have "AND" as the detault operator. The term
"(aktenzeichen:Am^10.0)" cannot be satisfied. The thing is: why does
it even appear there?
This is my current qf:
betreff^5.0 aktenzeichen^10.0 body^0.2 text^0.1
I have just changed this to only
text^0.1
for the sake of testing, and then it works.
It seems as if I haven't quite understood the impact of qf. I thought
it would allow me to boost the score based on a string appearing in a
field. I didn't expect it to affect what matches and what doesnt.
Marian
Re: Zero Matches Weirdness
Posted by Erik Hatcher <er...@gmail.com>.
No, don't do that. That's definitely not good advice. If the analysis chain is the same for both index and query, just use <analyzer>.
As for Marian's issue... was there literally a + in the query or was that urlencoded? Try debugQuery=true for both queries and see what you get for the query parsing output.
Erik
On Feb 3, 2012, at 14:18 , Dmitry Kan wrote:
> Actually, I wouldn't count on it and just specify index and query sides
> explicitly. Just to play it safe.
>
> On Fri, Feb 3, 2012 at 8:34 PM, Marian Steinbach <ma...@sendung.de> wrote:
>
>> 2012/2/3 Dmitry Kan <dm...@gmail.com>:
>>> What about <query> side of the field?
>>>
>>
>> It's identical. At least that's what I think, since I din't specify
>> the type="query" or type="index" attribute for the analyzer part.
>>
>> Marian
>>
>
>
>
> --
> Regards,
>
> Dmitry Kan
Re: Zero Matches Weirdness
Posted by Dmitry Kan <dm...@gmail.com>.
Actually, I wouldn't count on it and just specify index and query sides
explicitly. Just to play it safe.
On Fri, Feb 3, 2012 at 8:34 PM, Marian Steinbach <ma...@sendung.de> wrote:
> 2012/2/3 Dmitry Kan <dm...@gmail.com>:
> > What about <query> side of the field?
> >
>
> It's identical. At least that's what I think, since I din't specify
> the type="query" or type="index" attribute for the analyzer part.
>
> Marian
>
--
Regards,
Dmitry Kan
Re: Zero Matches Weirdness
Posted by Marian Steinbach <ma...@sendung.de>.
2012/2/3 Dmitry Kan <dm...@gmail.com>:
> What about <query> side of the field?
>
It's identical. At least that's what I think, since I din't specify
the type="query" or type="index" attribute for the analyzer part.
Marian
Re: Zero Matches Weirdness
Posted by Dmitry Kan <dm...@gmail.com>.
What about <query> side of the field?
On Fri, Feb 3, 2012 at 6:11 PM, Marian Steinbach <ma...@sendung.de> wrote:
> Hi!
>
> I am having a weird issue with a search string not producing a match
> where it should. I can reproduce it with both 3.4 and 3.5.
>
> "Where it should" means that I am getting a hit in the "Analyse" tool
> in the admin panel, but not in a query via /select.
>
> Now when I try
>
> select?q=Am+Heidstamm&...
>
> I get zero results back. But, when I quote the string
>
> select?q=%22Am+Heidstamm%22&...
>
> I get several hits.
>
> BTW, the token "am" is filtered out in the field text, since it's in a
> stopword list.
>
> Any ideas on how this can b explained?
>
> My defaultSearchField ist "text". The field gets its content via
> several copyField statements.
>
> The configuration for text is as follows:
>
> <field name="text" type="text_de" indexed="true" stored="false"
> multiValued="true" />
>
> The configuration for type text_de is this:
>
> <fieldType name="text_de" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer>
> <!-- protect slashes from tokenizer by replacing
> with something unique -->
> <charFilter
> class="solr.PatternReplaceCharFilterFactory"
> pattern="([A-Z]+)/([0-9]+)/([0-9]+)"
> replacement="$1ḧ$2ḧ$3" />
> <charFilter
> class="solr.PatternReplaceCharFilterFactory"
> pattern="([0-9]+)/([0-9]+)"
> replacement="$1ḧ$2" />
> <!-- protect paragraph symbol from tokenizer -->
> <charFilter
> class="solr.PatternReplaceCharFilterFactory"
> pattern="§\s*([0-9]+)" replacement="ǚ$1" />
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
> generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1"
> preserveOriginal="1"
> splitOnCaseChange="1"/>
> <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> words="stopwords_de.txt" enablePositionIncrements="true" />
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.GermanMinimalStemFilterFactory"
> />
> <!-- get slashes back in -->
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="ḧ"
> replacement="/" />
> <!-- get paragraph symbols back in -->
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="ǚ"
> replacement="§" />
> </analyzer>
> </fieldType>
>
>
> Log output for the unquoted phrase:
>
> INFO: [] webapp=/solr path=/select
>
> params={facet=true&sort=score+desc&fl=sitzung,gremium,betreff,datum,timestamp,score,aktenzeichen,typ,id,anhang&debugQuery=true&start=0&q=Am+Heidstamm&hl.fl=betreff&wt=json&fq=&hl=true&rows=10}
> hits=0 status=0 QTime=29
>
> ... and for the quoted one:
>
> INFO: [] webapp=/solr path=/select
>
> params={facet=true&sort=score+desc&fl=sitzung,gremium,betreff,datum,timestamp,score,aktenzeichen,typ,id,anhang&start=0&q="Am+Heidstamm"&hl.fl=betreff&wt=standard&fq=&hl=true&rows=10&version=2.2}
> hits=14 status=0 QTime=244
>
>
> Thanks!
>
--
Regards,
Dmitry Kan