You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "O. Klein" <kl...@octoweb.nl> on 2011/08/19 21:02:09 UTC

Terms.regex performance issue

As I want to use it in an Autocomplete it has to be fast. Terms.prefix gets
results in around 100 milliseconds, while terms.regex is 10 to 20 times
slower.

Not storing the field made it a bit faster but not enough. The index is on a
seperate core and only about 5Mb big. Are there some tricks to make it work
a lot faster? Or do I have to switch to ngrams or something?





--
View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3268994.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Terms.regex performance issue

Posted by Markus Jelsma <ma...@openindex.io>.
TermsComponent uses java.util.regex which is not particulary fast. If the 
number of terms grows your CPU is going to overheat. I'd prefer an analyzer 
approach.

> As I want to use it in an Autocomplete it has to be fast. Terms.prefix gets
> results in around 100 milliseconds, while terms.regex is 10 to 20 times
> slower.
> 
> Not storing the field made it a bit faster but not enough. The index is on
> a seperate core and only about 5Mb big. Are there some tricks to make it
> work a lot faster? Or do I have to switch to ngrams or something?
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994
> p3268994.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Terms.regex performance issue

Posted by "O. Klein" <kl...@octoweb.nl>.
Read  http://lucene.472066.n3.nabble.com/suggester-issues-td3262718.html
http://lucene.472066.n3.nabble.com/suggester-issues-td3262718.html  for more
info about the QueryConverter. IMO Suggester should make it easier to choose
between QueryConverters.

As for the infix, WIKI says its planned feature, but the Suggester hasnt't
been worked on for couple of months. So guess we will have to wait :)

--
View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3338899.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Terms.regex performance issue

Posted by tbarbugli <tb...@gmail.com>.
Hi,
I do have the same problem, i am looking for infix autocomplete, could you
elaborate a bit on your QueryConverter - Suggester solution ?
Thank You!

--
View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3338273.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Terms.regex performance issue

Posted by "O. Klein" <kl...@octoweb.nl>.
I see now in Suggester Wiki; Support for infix-suggestions is planned for
FSTLookup (which would be the only structure to support these).


--
View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3273711.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Terms.regex performance issue

Posted by "O. Klein" <kl...@octoweb.nl>.
Of course. Thats why I compared prefix to bla* and saw it was already a lot
slower.

--
View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3273370.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Terms.regex performance issue

Posted by Erick Erickson <er...@gmail.com>.
Ah, in that case, comparing prefix and regex is an apples-to-oranges
comparison. I expect regex to be slower, but a fairer comparison
would be prefix to stuff* (which may be changed into a prefix
enumeration for all I know). But comparing infix to prefix doesn't tell you
much really....

Best
Erick

P.S. There's no reason to do anything if you have a solution that works
already though.

On Sun, Aug 21, 2011 at 12:56 PM, O. Klein <kl...@octoweb.nl> wrote:
> Yeah, I was searching infix. It worked very nice for autocomplete.
>
> Made a custom QueryConverter for the Suggester so it gives proper
> suggestions for shingles. Will stick with that for now.
>
> Thanx for the feedback.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3273145.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Terms.regex performance issue

Posted by "O. Klein" <kl...@octoweb.nl>.
Yeah, I was searching infix. It worked very nice for autocomplete.

Made a custom QueryConverter for the Suggester so it gives proper
suggestions for shingles. Will stick with that for now.

Thanx for the feedback.

--
View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3273145.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Terms.regex performance issue

Posted by Erick Erickson <er...@gmail.com>.
Wait. Sometimes I get confused because gmail will substitute
* for bolding, so in my client it looks like you're searching infix (e.g.
leading and trailing wildcards). If that's the case, then your performance
will always be poor, it has to enumerate all the terms in the field...

If it's just bolding confusing me, then never mind....

Best
Erick

On Fri, Aug 19, 2011 at 8:27 PM, O. Klein <kl...@octoweb.nl> wrote:
> Terms.prefix was just to compare performance.
>
> The use case was terms.regex=.*query.* And as Markus pointed out, this will
> prolly remain a bottleneck.
>
> I looked at the Suggester. But like many others I have been struggling to
> make it useful. It needs a custom queryConverter to give proper suggestions,
> but I havent tried this yet.
>
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3269628.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Terms.regex performance issue

Posted by "O. Klein" <kl...@octoweb.nl>.
Terms.prefix was just to compare performance.

The use case was terms.regex=.*query.* And as Markus pointed out, this will
prolly remain a bottleneck.

I looked at the Suggester. But like many others I have been struggling to
make it useful. It needs a custom queryConverter to give proper suggestions,
but I havent tried this yet.






--
View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3269628.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Terms.regex performance issue

Posted by Bill Bell <bi...@gmail.com>.
We do something like:

http://localhost:8983/solr/provs/terms?terms.fl=payor&terms.regex.flag=case
_insensitive&terms.regex=%28.*%29WHAT USER TYPES%28.*%29&terms.limit=-1


We want not just prefix but anywhere in the terms.



On 8/19/11 5:21 PM, "Chris Hostetter" <ho...@fucit.org> wrote:

>
>: Subject: Terms.regex performance issue
>: 
>: As I want to use it in an Autocomplete it has to be fast. Terms.prefix
>gets
>: results in around 100 milliseconds, while terms.regex is 10 to 20 times
>: slower.
>
>can you elaborate on how you are using terms.regex?  what does your regex
>look like? .. particularly if your usecase is autocomplete terms.prefix
>seems like an odd choice.
>
>Possible XY Problem?
>https://people.apache.org/~hossman/#xyproblem
>
>Have you looked at using the Suggester plugin?
>
>https://wiki.apache.org/solr/Suggester
>
>
>-Hoss



Re: Terms.regex performance issue

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Terms.regex performance issue
: 
: As I want to use it in an Autocomplete it has to be fast. Terms.prefix gets
: results in around 100 milliseconds, while terms.regex is 10 to 20 times
: slower.

can you elaborate on how you are using terms.regex?  what does your regex 
look like? .. particularly if your usecase is autocomplete terms.prefix 
seems like an odd choice. 

Possible XY Problem?
https://people.apache.org/~hossman/#xyproblem

Have you looked at using the Suggester plugin?

https://wiki.apache.org/solr/Suggester


-Hoss