You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by "Henrik Hertel (Jira)" <ji...@apache.org> on 2022/05/08 16:51:00 UTC

[jira] [Created] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

Henrik Hertel created LUCENE-10562:
--------------------------------------

Summary: Large system: Wildcard search leads to full index scan despite filter query
Key: LUCENE-10562
URL: https://issues.apache.org/jira/browse/LUCENE-10562
Project: Lucene - Core
Issue Type: Bug
Components: core/search
Affects Versions: 8.11.1
Reporter: Henrik Hertel

I use Solr and have a large system with 1TB in one core and about 5 million documents. The textual content of large PDF files is indexed there. My query is extremely slow as soon as I use wildcards e.g. *searchvalue*, even though I put a filter query in front of it that reduces to less than 20 documents.

searchvalue -> less than 1 second
searchvalue* -> less than 1 second
*searchvalue* -> more than 30 seconds

My query:
select?defType=lucene&q=content_t:*searchvalue*&fq=metadataitemids_is:20950&fq=renditions_ss%3A*&fl=id&rows=50&start=0

I've tried everything imaginable. It doesn't make sense to me why a search over a small subset should take so long. If I omit the filter query metadataitemids_is:20950, so search the entire inventory, then it also takes the same amount of time. Therefore, I suspect that despite the filter query, the main query runs over the entire index.

--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org