You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2015/12/01 12:09:11 UTC

Re: LUCENE-5791 and LUCENE-6672 (BasicOperations#determinize() performance)

Hi Irfan,

Newer version of Lucene have a default limit in place for the highest
number of states determinize will create before giving up.

It was added exactly for this reason (accidentally disastrous wildcards).

This is necessary because Lucene converts the incoming wildcard to a
DFA, which has adversarial cases (that your users are hitting!).  It
is also possible, though not implemented in Lucene, to use an NFA and
"be" in multiple states at once.  It likely means slower searching for
non-adversarial cases, but fast searching (compared to today) for
adversarial ones.

I think it would be interesting to explore an NFA implementation for Lucene!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 30, 2015 at 4:46 PM, Irfan Hamid
<ir...@gmail.com> wrote:
> Lucene devs,
>
> We are hitting performance problems when our customers issue pathological
> wildcard queries. Searching the Lucene JIRA I came across these two work
> items and unfortunately it seems like there's no easy way out. However, in
> LUCENE-6672 David Causse has a couple of proposed solutions. I was wondering
> if either of those or something similar were integrated into the code-base
> down the line?
>
> If not, would the community be interested in a pull request if/when we fix
> this in our fork and bake it in production for a while?
>
> TIA,
> Irfan.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LUCENE-5791 and LUCENE-6672 (BasicOperations#determinize() performance)

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hi Irfan,

LUCENE-6672 is not really an issue, because the maxDeterminizedStates
limit is still enforced, just at an earlier stage (determinize) that
that issue expected (after UTF8 conversion of the determinized
automaton).

Mike McCandless

http://blog.mikemccandless.com


On Tue, Dec 1, 2015 at 12:39 PM, Irfan Hamid
<ir...@gmail.com> wrote:
> Hi Michael,
>
> Is the functionality you're mentioning the same as the one pointed out by
> David Causse in LUCENE-6672? If so he is claiming that maxDeterminizedStates
> is not respected by UTF32ToUTF8 and can thus cause a problem. I'm looking at
> lucene trunk to try and figure it out. However, input from you would be much
> appreciated.
>
> TIA,
> Irfan.
>
> On Tue, Dec 1, 2015 at 3:19 AM, Dawid Weiss <da...@gmail.com> wrote:
>>>
>>>
>>> I think it would be interesting to explore an NFA implementation for
>>> Lucene!
>>>
>>
>> It would be interesting and valuable to have an optimized non-DFA graph
>> engine in general. I'm thinking of something like re2.
>> https://github.com/google/re2
>>
>> Dawid
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: LUCENE-5791 and LUCENE-6672 (BasicOperations#determinize() performance)

Posted by Irfan Hamid <ir...@gmail.com>.
Hi Michael,

Is the functionality you're mentioning the same as the one pointed out by
David Causse in LUCENE-6672
<https://issues.apache.org/jira/browse/LUCENE-6672>? If so he is claiming
that maxDeterminizedStates is not respected by UTF32ToUTF8 and can thus
cause a problem. I'm looking at lucene trunk to try and figure it out.
However, input from you would be much appreciated.

TIA,
Irfan.

On Tue, Dec 1, 2015 at 3:19 AM, Dawid Weiss <da...@gmail.com> wrote:

>
>> I think it would be interesting to explore an NFA implementation for
>> Lucene!
>>
>>
> It would be interesting and valuable to have an optimized non-DFA graph
> engine in general. I'm thinking of something like re2.
> https://github.com/google/re2
>
> Dawid
>
>

Re: LUCENE-5791 and LUCENE-6672 (BasicOperations#determinize() performance)

Posted by Dawid Weiss <da...@gmail.com>.
>
>
> I think it would be interesting to explore an NFA implementation for
> Lucene!
>
>
It would be interesting and valuable to have an optimized non-DFA graph
engine in general. I'm thinking of something like re2.
https://github.com/google/re2

Dawid