You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ueland <to...@gmail.com> on 2011/04/10 16:44:46 UTC

Performance with search terms starting and ending with wildcards

Hi!

I have been doing some testing with solr and wildcards. Queries like:

- *foo
- foo*

Does complete quickly(1-2s) in a test index on about 40-50GB.

But when i try to do a search for *foo*, the search time can without any
trouble come upwards for 30seconds plus. 

Any ideas on how that issue can be worked around? 

One fix would be to change *foo* to (*foo or foo* or oof* or *oof) (is the
reverse even needed?). But that will not give the same results as *foo*,
logicly enough.

I have also tried to set maxTimeAllowed, but that is simply ignored. I guess
that is related to either sorting or the wildcard search itself. 

--
View this message in context: http://lucene.472066.n3.nabble.com/Performance-with-search-terms-starting-and-ending-with-wildcards-tp2802561p2802561.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance with search terms starting and ending with wildcards

Posted by Ueland <to...@gmail.com>.
>Which version of solr are you using ?

Currently testing with 3.1

> NGrams could be an option but could you give us the field definition in
> your schema ? The words count in this field index ?

I wont share the complete schema but i can summarize it:

For testing, we have around 30 fields used to give us what we need from
documents that can be everything from 1 line to several MB`s of plain text,
and due to this size we have limited the copyfields to a maxmimum of 10 000
characters to limit the index size a bit.

We did a quick test of n-grams, the issue then was that the index grew from
around 90G and until the disk got full at 300G. (We tested more data/fields,
therefore the larger index)

The fact that a n-gram index becomes so large is a bit problematic.

Another interesting note: Even when i use the queryFilter to limit documents
to search in, the query is extremely slow (30s++ etc).

--
View this message in context: http://lucene.472066.n3.nabble.com/Performance-with-search-terms-starting-and-ending-with-wildcards-tp2802561p2802686.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance with search terms starting and ending with wildcards

Posted by lboutros <bo...@gmail.com>.
Which version of solr are you using ?

NGrams could be an option but could you give us the field definition in your
schema ? The words count in this field index ?

Ludovic.


2011/4/10 Ueland [via Lucene] <
ml-node+2802561-121096623-383657@n3.nabble.com>

> Hi!
>
> I have been doing some testing with solr and wildcards. Queries like:
>
> - *foo
> - foo*
>
> Does complete quickly(1-2s) in a test index on about 40-50GB.
>
> But when i try to do a search for *foo*, the search time can without any
> trouble come upwards for 30seconds plus.
>
> Any ideas on how that issue can be worked around?
>
> One fix would be to change *foo* to (*foo or foo* or oof* or *oof) (is the
> reverse even needed?). But that will not give the same results as *foo*,
> logicly enough.
>
> I have also tried to set maxTimeAllowed, but that is simply ignored. I
> guess that is related to either sorting or the wildcard search itself.
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Performance-with-search-terms-starting-and-ending-with-wildcards-tp2802561p2802561.html
>  To start a new topic under Solr - User, email
> ml-node+472068-1765922688-383657@n3.nabble.com
> To unsubscribe from Solr - User, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=>.
>
>


-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/Performance-with-search-terms-starting-and-ending-with-wildcards-tp2802561p2802579.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance with search terms starting and ending with wildcards

Posted by Ueland <to...@gmail.com>.
Hi!

Thanks for the reply.

We decided to give another try with ngrams. After much tweaking/tuning for
our needs. Both the size and speed was more than good enough for our needs.
So it looks like ngrams was the solution for us afterall :)

Best regards
Tor Henning Ueland

--
View this message in context: http://lucene.472066.n3.nabble.com/Performance-with-search-terms-starting-and-ending-with-wildcards-tp2802561p2871451.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance with search terms starting and ending with wildcards

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

Perhaps you should give Lucene/Solr trunk a try and compare!  The Wildcard query 
in trunk should be much faster.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Ueland <to...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Sun, April 10, 2011 10:44:46 AM
> Subject: Performance with search terms starting and ending with wildcards
> 
> Hi!
> 
> I have been doing some testing with solr and wildcards. Queries  like:
> 
> - *foo
> - foo*
> 
> Does complete quickly(1-2s) in a test index  on about 40-50GB.
> 
> But when i try to do a search for *foo*, the search  time can without any
> trouble come upwards for 30seconds plus. 
> 
> Any  ideas on how that issue can be worked around? 
> 
> One fix would be to change  *foo* to (*foo or foo* or oof* or *oof) (is the
> reverse even needed?). But  that will not give the same results as *foo*,
> logicly enough.
> 
> I have  also tried to set maxTimeAllowed, but that is simply ignored. I guess
> that is  related to either sorting or the wildcard search itself. 
> 
> --
> View this  message in context: 
>http://lucene.472066.n3.nabble.com/Performance-with-search-terms-starting-and-ending-with-wildcards-tp2802561p2802561.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
>