You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Melanie Drake <me...@gmail.com> on 2011/04/01 17:47:06 UTC

wildcard search inconsistencies

I noticed an inconsistency in results when performing wildcard searches. 
When searching on variations of "conditional" the following results
occurred:

conditional - hits
conditional* - hits
conditi* - hits
condit* - hits
con*al - no hits
c?nditional - no hits
c*ld - hits (on a different word: "child")

I don't see an obvious pattern to when the wildcard searches work.

In a response to another post, I read that stemming will cause wildcard
searches to behave strangely.  I believe we may be using stemming, although
the only configuration I see is the list of words protected against stemming
defined in protwords.txt.

Also, I'm not sure if it's helpful, but I see a vague Solr error in my
server log (jboss) any time I perform a search (whether successful or not):
ERROR [STDERR]  org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=null path=/select
params={q=condi*al&fq=url%3A%28%22%2F&fl=score&hl=true&hl.fragsize=50&hl.snippets=3}
hits=0 status=0 QTime=0

The developer who implemented our search solution is no longer with our
company, so I'm just looking for any information useful to investigate this
issue.  I apologize if I ommitted any necessary information.  Thanks!

--
View this message in context: http://lucene.472066.n3.nabble.com/wildcard-search-inconsistencies-tp2763787p2763787.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: wildcard search inconsistencies

Posted by lboutros <bo...@gmail.com>.
And to be more helpfull, you can activate the debug (debugQuery=on in the
query) mode to see the transform query :

for instance 'field:contitional' :

field:conditional
field:conditional
field:condit
field:condit

for 'field:conditional*' :

field:conditional*
field:conditional*
field:conditional*
field:conditional*

and for 'field:con*al' :

field:con*al
field:con*al
field:con*al
field:con*al

but in the field index the word 'conditional' is stored as 'condit' and is
not matched by 'con*al'.
but the words 'conceal' stored as is, 'congealable' stored as 'congeal' are
matched and retrieved (and highlighted if well configured).

Ludovic.

-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/wildcard-search-inconsistencies-tp2763787p2763918.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: wildcard search inconsistencies

Posted by Melanie Drake <me...@gmail.com>.
Thanks, Ludovic.  That was it.  I added the word "conditional" to the
protected words file and I no longer see the odd search results when using
wildcards.  I will try to disable stemming altogether.  

Thanks again!

--
View this message in context: http://lucene.472066.n3.nabble.com/wildcard-search-inconsistencies-tp2763787p2763934.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: wildcard search inconsistencies

Posted by lboutros <bo...@gmail.com>.
'conditional' seems to be stemmed into the word 'condit' in the index.

So your results are normal.

As you said, mixing wildcards searching and stemmed fields is not
recommanded.

Ludovic.

2011/4/1 Melanie Drake [via Lucene] <
ml-node+2763787-65059921-383657@n3.nabble.com>

> I noticed an inconsistency in results when performing wildcard searches.
>  When searching on variations of "conditional" the following results
> occurred:
>
> conditional - hits
> conditional* - hits
> conditi* - hits
> condit* - hits
> con*al - no hits
> c?nditional - no hits
> c*ld - hits (on a different word: "child")
>
> I don't see an obvious pattern to when the wildcard searches work.
>
> In a response to another post, I read that stemming will cause wildcard
> searches to behave strangely.  I believe we may be using stemming, although
> the only configuration I see is the list of words protected against stemming
> defined in protwords.txt.
>
> Also, I'm not sure if it's helpful, but I see a vague Solr error in my
> server log (jboss) any time I perform a search (whether successful or not):
> ERROR [STDERR] <timestamp> org.apache.solr.core.SolrCore execute
> INFO: [core0] webapp=null path=/select
> params={q=condi*al&fq=url%3A%28%22%2F<long list of application-specific IDs
> used for filtering>&fl=score&hl=true&hl.fragsize=50&hl.snippets=3} hits=0
> status=0 QTime=0
>
> The developer who implemented our search solution is no longer with our
> company, so I'm just looking for any information useful to investigate this
> issue.  I apologize if I ommitted any necessary information.  Thanks!
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/wildcard-search-inconsistencies-tp2763787p2763787.html
>  To start a new topic under Solr - User, email
> ml-node+472068-1765922688-383657@n3.nabble.com
> To unsubscribe from Solr - User, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=>.
>
>


-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/wildcard-search-inconsistencies-tp2763787p2763841.html
Sent from the Solr - User mailing list archive at Nabble.com.