You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Nick Barkas (JIRA)" <ji...@apache.org> on 2010/08/24 18:34:16 UTC

[jira] Created: (LUCENE-2620) Queries with too many asterisks causing 100% CPU usage

Queries with too many asterisks causing 100% CPU usage
------------------------------------------------------

                 Key: LUCENE-2620
                 URL: https://issues.apache.org/jira/browse/LUCENE-2620
             Project: Lucene - Java
          Issue Type: Bug
          Components: Search
    Affects Versions: 3.0.1
         Environment: Debian Lenny with Tomcat 5.5 and Mac OS X 10.6 with Tomcat 6, probably others
            Reporter: Nick Barkas
         Attachments: lucene-asterisks.diff

If a search query has many adjacent asterisks (e.g. fo**************obar), I can get my webapp caught in a loop that does not seem to end in a reasonable amount of time and may in fact be infinite. For just a few asterisks the query eventually does return some results, but as I add more it takes a longer and longer amount of time. After about six or seven asterisks the query never seems to finish. Even if I abort the search, the thread handling the troublesome query continues running in the background and pinning a CPU.

I found the problem in src/java/org/apache/lucene/search/WildcardTermEnum.java on Lucene 3.0.1 and it looks like 3.0.2 ought to be affected as well. I'm not sure about trunk, though. I have a patch that fixes the problem for me in 3.0.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-2620) Queries with too many asterisks causing 100% CPU usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reassigned LUCENE-2620:
-----------------------------------

    Assignee: Robert Muir

> Queries with too many asterisks causing 100% CPU usage
> ------------------------------------------------------
>
>                 Key: LUCENE-2620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2620
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Debian Lenny with Tomcat 5.5 and Mac OS X 10.6 with Tomcat 6, probably others
>            Reporter: Nick Barkas
>            Assignee: Robert Muir
>         Attachments: lucene-asterisks.diff
>
>
> If a search query has many adjacent asterisks (e.g. fo**************obar), I can get my webapp caught in a loop that does not seem to end in a reasonable amount of time and may in fact be infinite. For just a few asterisks the query eventually does return some results, but as I add more it takes a longer and longer amount of time. After about six or seven asterisks the query never seems to finish. Even if I abort the search, the thread handling the troublesome query continues running in the background and pinning a CPU.
> I found the problem in src/java/org/apache/lucene/search/WildcardTermEnum.java on Lucene 3.0.1 and it looks like 3.0.2 ought to be affected as well. I'm not sure about trunk, though. I have a patch that fixes the problem for me in 3.0.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2620) Queries with too many asterisks causing 100% CPU usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2620:
--------------------------------

    Fix Version/s: 2.9.4
                   3.0.3
                   3.1

Assigning 2.9.x and 3.0.x fix versions as, it seems to loop infinitely (or the runtime is so terrible it might as well be infinite).


> Queries with too many asterisks causing 100% CPU usage
> ------------------------------------------------------
>
>                 Key: LUCENE-2620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2620
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Debian Lenny with Tomcat 5.5 and Mac OS X 10.6 with Tomcat 6, probably others
>            Reporter: Nick Barkas
>            Assignee: Robert Muir
>             Fix For: 2.9.4, 3.0.3, 3.1
>
>         Attachments: LUCENE-2620_3x.patch, lucene-asterisks.diff
>
>
> If a search query has many adjacent asterisks (e.g. fo**************obar), I can get my webapp caught in a loop that does not seem to end in a reasonable amount of time and may in fact be infinite. For just a few asterisks the query eventually does return some results, but as I add more it takes a longer and longer amount of time. After about six or seven asterisks the query never seems to finish. Even if I abort the search, the thread handling the troublesome query continues running in the background and pinning a CPU.
> I found the problem in src/java/org/apache/lucene/search/WildcardTermEnum.java on Lucene 3.0.1 and it looks like 3.0.2 ought to be affected as well. I'm not sure about trunk, though. I have a patch that fixes the problem for me in 3.0.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2620) Queries with too many asterisks causing 100% CPU usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901960#action_12901960 ] 

Robert Muir commented on LUCENE-2620:
-------------------------------------

Hello Nick, thanks for your patch.

In trunk this is no problem, because wildcard query works in a very different way and both foo**********bar and foo*bar are compiled to the same matcher:
{noformat}
    WildcardQuery wq = new WildcardQuery(new Term("foo", "foo*******bar"));
    WildcardQuery wq2 = new WildcardQuery(new Term("foo", "foo*bar"));
    assertEquals(wq.automaton.getNumberOfStates(), wq2.automaton.getNumberOfStates());
    assertEquals(wq.automaton.getNumberOfTransitions(), wq2.automaton.getNumberOfTransitions());
{noformat}

But at a glance, your patch looks like a potentially useful optimization for 3.x


> Queries with too many asterisks causing 100% CPU usage
> ------------------------------------------------------
>
>                 Key: LUCENE-2620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2620
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Debian Lenny with Tomcat 5.5 and Mac OS X 10.6 with Tomcat 6, probably others
>            Reporter: Nick Barkas
>         Attachments: lucene-asterisks.diff
>
>
> If a search query has many adjacent asterisks (e.g. fo**************obar), I can get my webapp caught in a loop that does not seem to end in a reasonable amount of time and may in fact be infinite. For just a few asterisks the query eventually does return some results, but as I add more it takes a longer and longer amount of time. After about six or seven asterisks the query never seems to finish. Even if I abort the search, the thread handling the troublesome query continues running in the background and pinning a CPU.
> I found the problem in src/java/org/apache/lucene/search/WildcardTermEnum.java on Lucene 3.0.1 and it looks like 3.0.2 ought to be affected as well. I'm not sure about trunk, though. I have a patch that fixes the problem for me in 3.0.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2620) Queries with too many asterisks causing 100% CPU usage

Posted by "Nick Barkas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Barkas updated LUCENE-2620:
--------------------------------

    Attachment: lucene-asterisks.diff

> Queries with too many asterisks causing 100% CPU usage
> ------------------------------------------------------
>
>                 Key: LUCENE-2620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2620
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Debian Lenny with Tomcat 5.5 and Mac OS X 10.6 with Tomcat 6, probably others
>            Reporter: Nick Barkas
>         Attachments: lucene-asterisks.diff
>
>
> If a search query has many adjacent asterisks (e.g. fo**************obar), I can get my webapp caught in a loop that does not seem to end in a reasonable amount of time and may in fact be infinite. For just a few asterisks the query eventually does return some results, but as I add more it takes a longer and longer amount of time. After about six or seven asterisks the query never seems to finish. Even if I abort the search, the thread handling the troublesome query continues running in the background and pinning a CPU.
> I found the problem in src/java/org/apache/lucene/search/WildcardTermEnum.java on Lucene 3.0.1 and it looks like 3.0.2 ought to be affected as well. I'm not sure about trunk, though. I have a patch that fixes the problem for me in 3.0.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2620) Queries with too many asterisks causing 100% CPU usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-2620.
---------------------------------

    Resolution: Fixed

Committed to 3.x (988620) 3.0.x (988638) 2.9.x (988682).

Thanks Nick!

> Queries with too many asterisks causing 100% CPU usage
> ------------------------------------------------------
>
>                 Key: LUCENE-2620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2620
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Debian Lenny with Tomcat 5.5 and Mac OS X 10.6 with Tomcat 6, probably others
>            Reporter: Nick Barkas
>            Assignee: Robert Muir
>             Fix For: 2.9.4, 3.0.3, 3.1
>
>         Attachments: LUCENE-2620_3x.patch, lucene-asterisks.diff
>
>
> If a search query has many adjacent asterisks (e.g. fo**************obar), I can get my webapp caught in a loop that does not seem to end in a reasonable amount of time and may in fact be infinite. For just a few asterisks the query eventually does return some results, but as I add more it takes a longer and longer amount of time. After about six or seven asterisks the query never seems to finish. Even if I abort the search, the thread handling the troublesome query continues running in the background and pinning a CPU.
> I found the problem in src/java/org/apache/lucene/search/WildcardTermEnum.java on Lucene 3.0.1 and it looks like 3.0.2 ought to be affected as well. I'm not sure about trunk, though. I have a patch that fixes the problem for me in 3.0.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2620) Queries with too many asterisks causing 100% CPU usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2620:
--------------------------------

    Attachment: LUCENE-2620_3x.patch

I took a look at this, and the worst-case behavior in 3x etc is, in my opinion, definitely bug territory.

when 3x's wildcardEquals() encounters a '*', it does this:
{code}
for (int i = string.length(); i >= s; --i)
          {
            if (wildcardEquals(pattern, p, string, i))
            {
              return true;
            }
          }
{code}

This is itself already inside a loop in wildcardEquals, so its a disaster.

I added a test for this, and Nick's fix (with one needed length check) and the tests pass.
but if you run the test without the change, you will see what Nick is experiencing.


> Queries with too many asterisks causing 100% CPU usage
> ------------------------------------------------------
>
>                 Key: LUCENE-2620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2620
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Debian Lenny with Tomcat 5.5 and Mac OS X 10.6 with Tomcat 6, probably others
>            Reporter: Nick Barkas
>            Assignee: Robert Muir
>         Attachments: LUCENE-2620_3x.patch, lucene-asterisks.diff
>
>
> If a search query has many adjacent asterisks (e.g. fo**************obar), I can get my webapp caught in a loop that does not seem to end in a reasonable amount of time and may in fact be infinite. For just a few asterisks the query eventually does return some results, but as I add more it takes a longer and longer amount of time. After about six or seven asterisks the query never seems to finish. Even if I abort the search, the thread handling the troublesome query continues running in the background and pinning a CPU.
> I found the problem in src/java/org/apache/lucene/search/WildcardTermEnum.java on Lucene 3.0.1 and it looks like 3.0.2 ought to be affected as well. I'm not sure about trunk, though. I have a patch that fixes the problem for me in 3.0.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org