You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2010/07/26 19:16:27 UTC

TestRegexpRandom2

My random stress testing hit an IllegalArgExc because the random
regexp was malformed.

Does this patch look OK to fix?

Index: src/test/org/apache/lucene/search/TestRegexpRandom2.java
===================================================================
--- src/test/org/apache/lucene/search/TestRegexpRandom2.java	(revision 979227)
+++ src/test/org/apache/lucene/search/TestRegexpRandom2.java	(working copy)
@@ -130,6 +130,14 @@
    * simple regexpquery implementation.
    */
   private void assertSame(String regexp) throws IOException {
+    try {
+      new RegExp(regexp);
+    } catch (IllegalArgumentException iae) {
+      // the random regexp could be malformed, eg "foo"bar",
+      // so we ignore this
+      return;
+    }
+
     RegexpQuery smart = new RegexpQuery(new Term("field", regexp));
     DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field", regexp));

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: TestRegexpRandom2

Posted by Robert Muir <rc...@gmail.com>.
I thought about it, i think this is the best fix I think RegExp.toString is
just for debugging, not really guaranteed to be reparseable.

On Mon, Jul 26, 2010 at 1:49 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> OK lemme try...
>
> Mike
>
> On Mon, Jul 26, 2010 at 1:47 PM, Robert Muir <rc...@gmail.com> wrote:
> > maybe you can try this on your beast computer for a while? I think its
> > better.
> > Index: lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java
> > ===================================================================
> > --- lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java
> > (revision 979377)
> > +++ lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java
> (working
> > copy)
> > @@ -86,9 +86,9 @@
> >    private class DumbRegexpQuery extends MultiTermQuery {
> >      private final Automaton automaton;
> >
> > -    DumbRegexpQuery(Term term) {
> > +    DumbRegexpQuery(Term term, int flags) {
> >        super(term.field());
> > -      RegExp re = new RegExp(term.text());
> > +      RegExp re = new RegExp(term.text(), flags);
> >        automaton = re.toAutomaton();
> >      }
> >
> > @@ -130,8 +130,8 @@
> >     * simple regexpquery implementation.
> >     */
> >    private void assertSame(String regexp) throws IOException {
> > -    RegexpQuery smart = new RegexpQuery(new Term("field", regexp));
> > -    DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field",
> regexp));
> > +    RegexpQuery smart = new RegexpQuery(new Term("field", regexp),
> > RegExp.NONE);
> > +    DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field",
> regexp),
> > RegExp.NONE);
> >
> >      // we can't compare the two if automaton rewrites to a simpler enum.
> >      // for example: "a\uda07\udcc7?.*?" gets rewritten to a simpler
> query:
> > Index:
> > lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java
> > ===================================================================
> > ---
> lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java
> > (revision 979377)
> > +++
> lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java
> > (working copy)
> > @@ -40,7 +40,9 @@
> >        if (!UnicodeUtil.validUTF16String(regexp))
> >          continue;
> >        try {
> > -        return new RegExp(regexp, RegExp.NONE);
> > +        // NOTE: we parse-tostring-parse again, because we are
> > +        // really abusing RegExp.toString() here (its just for
> debugging)
> > +        return new RegExp(new RegExp(regexp, RegExp.NONE).toString(),
> > RegExp.NONE);
> >        } catch (Exception e) {}
> >      }
> >    }
> >
> > On Mon, Jul 26, 2010 at 1:32 PM, Michael McCandless
> > <lu...@mikemccandless.com> wrote:
> >>
> >> Hmm that doesn't fix it.
> >>
> >> The magical seed is: 50686536365145364L (for TRR2).
> >>
> >> I still see the IAE even if I remove that RegExp.NONE.
> >>
> >> Mike
> >>
> >> On Mon, Jul 26, 2010 at 1:28 PM, Robert Muir <rc...@gmail.com> wrote:
> >> > I don't think we should do that. I think i found the problem:
> >> > AutomatonTestUtil's randomRegexp does this:
> >> >       try {
> >> >         return new RegExp(regexp, RegExp.NONE);
> >> >       } catch (Exception e) {}
> >> > I think the RegExp.NONE is the problem, and we should remove it.
> >> > because the test then toString's it and compiles it again, but without
> >> > this
> >> > option.
> >> > On Mon, Jul 26, 2010 at 1:16 PM, Michael McCandless
> >> > <lu...@mikemccandless.com> wrote:
> >> >>
> >> >> My random stress testing hit an IllegalArgExc because the random
> >> >> regexp was malformed.
> >> >>
> >> >> Does this patch look OK to fix?
> >> >>
> >> >> Index: src/test/org/apache/lucene/search/TestRegexpRandom2.java
> >> >> ===================================================================
> >> >> --- src/test/org/apache/lucene/search/TestRegexpRandom2.java
> >> >>  (revision
> >> >> 979227)
> >> >> +++ src/test/org/apache/lucene/search/TestRegexpRandom2.java
> >> >>  (working
> >> >> copy)
> >> >> @@ -130,6 +130,14 @@
> >> >>    * simple regexpquery implementation.
> >> >>    */
> >> >>   private void assertSame(String regexp) throws IOException {
> >> >> +    try {
> >> >> +      new RegExp(regexp);
> >> >> +    } catch (IllegalArgumentException iae) {
> >> >> +      // the random regexp could be malformed, eg "foo"bar",
> >> >> +      // so we ignore this
> >> >> +      return;
> >> >> +    }
> >> >> +
> >> >>     RegexpQuery smart = new RegexpQuery(new Term("field", regexp));
> >> >>     DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field",
> >> >> regexp));
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Robert Muir
> >> > rcmuir@gmail.com
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: TestRegexpRandom2

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK lemme try...

Mike

On Mon, Jul 26, 2010 at 1:47 PM, Robert Muir <rc...@gmail.com> wrote:
> maybe you can try this on your beast computer for a while? I think its
> better.
> Index: lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java
> ===================================================================
> --- lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java
> (revision 979377)
> +++ lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java (working
> copy)
> @@ -86,9 +86,9 @@
>    private class DumbRegexpQuery extends MultiTermQuery {
>      private final Automaton automaton;
>
> -    DumbRegexpQuery(Term term) {
> +    DumbRegexpQuery(Term term, int flags) {
>        super(term.field());
> -      RegExp re = new RegExp(term.text());
> +      RegExp re = new RegExp(term.text(), flags);
>        automaton = re.toAutomaton();
>      }
>
> @@ -130,8 +130,8 @@
>     * simple regexpquery implementation.
>     */
>    private void assertSame(String regexp) throws IOException {
> -    RegexpQuery smart = new RegexpQuery(new Term("field", regexp));
> -    DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field", regexp));
> +    RegexpQuery smart = new RegexpQuery(new Term("field", regexp),
> RegExp.NONE);
> +    DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field", regexp),
> RegExp.NONE);
>
>      // we can't compare the two if automaton rewrites to a simpler enum.
>      // for example: "a\uda07\udcc7?.*?" gets rewritten to a simpler query:
> Index:
> lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java
> ===================================================================
> --- lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java
> (revision 979377)
> +++ lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java
> (working copy)
> @@ -40,7 +40,9 @@
>        if (!UnicodeUtil.validUTF16String(regexp))
>          continue;
>        try {
> -        return new RegExp(regexp, RegExp.NONE);
> +        // NOTE: we parse-tostring-parse again, because we are
> +        // really abusing RegExp.toString() here (its just for debugging)
> +        return new RegExp(new RegExp(regexp, RegExp.NONE).toString(),
> RegExp.NONE);
>        } catch (Exception e) {}
>      }
>    }
>
> On Mon, Jul 26, 2010 at 1:32 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> Hmm that doesn't fix it.
>>
>> The magical seed is: 50686536365145364L (for TRR2).
>>
>> I still see the IAE even if I remove that RegExp.NONE.
>>
>> Mike
>>
>> On Mon, Jul 26, 2010 at 1:28 PM, Robert Muir <rc...@gmail.com> wrote:
>> > I don't think we should do that. I think i found the problem:
>> > AutomatonTestUtil's randomRegexp does this:
>> >       try {
>> >         return new RegExp(regexp, RegExp.NONE);
>> >       } catch (Exception e) {}
>> > I think the RegExp.NONE is the problem, and we should remove it.
>> > because the test then toString's it and compiles it again, but without
>> > this
>> > option.
>> > On Mon, Jul 26, 2010 at 1:16 PM, Michael McCandless
>> > <lu...@mikemccandless.com> wrote:
>> >>
>> >> My random stress testing hit an IllegalArgExc because the random
>> >> regexp was malformed.
>> >>
>> >> Does this patch look OK to fix?
>> >>
>> >> Index: src/test/org/apache/lucene/search/TestRegexpRandom2.java
>> >> ===================================================================
>> >> --- src/test/org/apache/lucene/search/TestRegexpRandom2.java
>> >>  (revision
>> >> 979227)
>> >> +++ src/test/org/apache/lucene/search/TestRegexpRandom2.java
>> >>  (working
>> >> copy)
>> >> @@ -130,6 +130,14 @@
>> >>    * simple regexpquery implementation.
>> >>    */
>> >>   private void assertSame(String regexp) throws IOException {
>> >> +    try {
>> >> +      new RegExp(regexp);
>> >> +    } catch (IllegalArgumentException iae) {
>> >> +      // the random regexp could be malformed, eg "foo"bar",
>> >> +      // so we ignore this
>> >> +      return;
>> >> +    }
>> >> +
>> >>     RegexpQuery smart = new RegexpQuery(new Term("field", regexp));
>> >>     DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field",
>> >> regexp));
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>
>> >
>> >
>> >
>> > --
>> > Robert Muir
>> > rcmuir@gmail.com
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: TestRegexpRandom2

Posted by Robert Muir <rc...@gmail.com>.
maybe you can try this on your beast computer for a while? I think its
better.

Index: lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java
===================================================================
--- lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java (revision
979377)
+++ lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java (working
copy)
@@ -86,9 +86,9 @@
   private class DumbRegexpQuery extends MultiTermQuery {
     private final Automaton automaton;

-    DumbRegexpQuery(Term term) {
+    DumbRegexpQuery(Term term, int flags) {
       super(term.field());
-      RegExp re = new RegExp(term.text());
+      RegExp re = new RegExp(term.text(), flags);
       automaton = re.toAutomaton();
     }

@@ -130,8 +130,8 @@
    * simple regexpquery implementation.
    */
   private void assertSame(String regexp) throws IOException {
-    RegexpQuery smart = new RegexpQuery(new Term("field", regexp));
-    DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field", regexp));
+    RegexpQuery smart = new RegexpQuery(new Term("field", regexp),
RegExp.NONE);
+    DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field", regexp),
RegExp.NONE);

     // we can't compare the two if automaton rewrites to a simpler enum.
     // for example: "a\uda07\udcc7?.*?" gets rewritten to a simpler query:
Index:
lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java
===================================================================
--- lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java
(revision
979377)
+++ lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java
(working
copy)
@@ -40,7 +40,9 @@
       if (!UnicodeUtil.validUTF16String(regexp))
         continue;
       try {
-        return new RegExp(regexp, RegExp.NONE);
+        // NOTE: we parse-tostring-parse again, because we are
+        // really abusing RegExp.toString() here (its just for debugging)
+        return new RegExp(new RegExp(regexp, RegExp.NONE).toString(),
RegExp.NONE);
       } catch (Exception e) {}
     }
   }


On Mon, Jul 26, 2010 at 1:32 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Hmm that doesn't fix it.
>
> The magical seed is: 50686536365145364L (for TRR2).
>
> I still see the IAE even if I remove that RegExp.NONE.
>
> Mike
>
> On Mon, Jul 26, 2010 at 1:28 PM, Robert Muir <rc...@gmail.com> wrote:
> > I don't think we should do that. I think i found the problem:
> > AutomatonTestUtil's randomRegexp does this:
> >       try {
> >         return new RegExp(regexp, RegExp.NONE);
> >       } catch (Exception e) {}
> > I think the RegExp.NONE is the problem, and we should remove it.
> > because the test then toString's it and compiles it again, but without
> this
> > option.
> > On Mon, Jul 26, 2010 at 1:16 PM, Michael McCandless
> > <lu...@mikemccandless.com> wrote:
> >>
> >> My random stress testing hit an IllegalArgExc because the random
> >> regexp was malformed.
> >>
> >> Does this patch look OK to fix?
> >>
> >> Index: src/test/org/apache/lucene/search/TestRegexpRandom2.java
> >> ===================================================================
> >> --- src/test/org/apache/lucene/search/TestRegexpRandom2.java
>  (revision
> >> 979227)
> >> +++ src/test/org/apache/lucene/search/TestRegexpRandom2.java    (working
> >> copy)
> >> @@ -130,6 +130,14 @@
> >>    * simple regexpquery implementation.
> >>    */
> >>   private void assertSame(String regexp) throws IOException {
> >> +    try {
> >> +      new RegExp(regexp);
> >> +    } catch (IllegalArgumentException iae) {
> >> +      // the random regexp could be malformed, eg "foo"bar",
> >> +      // so we ignore this
> >> +      return;
> >> +    }
> >> +
> >>     RegexpQuery smart = new RegexpQuery(new Term("field", regexp));
> >>     DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field",
> regexp));
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: TestRegexpRandom2

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hmm that doesn't fix it.

The magical seed is: 50686536365145364L (for TRR2).

I still see the IAE even if I remove that RegExp.NONE.

Mike

On Mon, Jul 26, 2010 at 1:28 PM, Robert Muir <rc...@gmail.com> wrote:
> I don't think we should do that. I think i found the problem:
> AutomatonTestUtil's randomRegexp does this:
>       try {
>         return new RegExp(regexp, RegExp.NONE);
>       } catch (Exception e) {}
> I think the RegExp.NONE is the problem, and we should remove it.
> because the test then toString's it and compiles it again, but without this
> option.
> On Mon, Jul 26, 2010 at 1:16 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> My random stress testing hit an IllegalArgExc because the random
>> regexp was malformed.
>>
>> Does this patch look OK to fix?
>>
>> Index: src/test/org/apache/lucene/search/TestRegexpRandom2.java
>> ===================================================================
>> --- src/test/org/apache/lucene/search/TestRegexpRandom2.java    (revision
>> 979227)
>> +++ src/test/org/apache/lucene/search/TestRegexpRandom2.java    (working
>> copy)
>> @@ -130,6 +130,14 @@
>>    * simple regexpquery implementation.
>>    */
>>   private void assertSame(String regexp) throws IOException {
>> +    try {
>> +      new RegExp(regexp);
>> +    } catch (IllegalArgumentException iae) {
>> +      // the random regexp could be malformed, eg "foo"bar",
>> +      // so we ignore this
>> +      return;
>> +    }
>> +
>>     RegexpQuery smart = new RegexpQuery(new Term("field", regexp));
>>     DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field", regexp));
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: TestRegexpRandom2

Posted by Robert Muir <rc...@gmail.com>.
I don't think we should do that. I think i found the problem:

AutomatonTestUtil's randomRegexp does this:
      try {
        return new RegExp(regexp, RegExp.NONE);
      } catch (Exception e) {}

I think the RegExp.NONE is the problem, and we should remove it.
because the test then toString's it and compiles it again, but without this
option.

On Mon, Jul 26, 2010 at 1:16 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> My random stress testing hit an IllegalArgExc because the random
> regexp was malformed.
>
> Does this patch look OK to fix?
>
> Index: src/test/org/apache/lucene/search/TestRegexpRandom2.java
> ===================================================================
> --- src/test/org/apache/lucene/search/TestRegexpRandom2.java    (revision
> 979227)
> +++ src/test/org/apache/lucene/search/TestRegexpRandom2.java    (working
> copy)
> @@ -130,6 +130,14 @@
>    * simple regexpquery implementation.
>    */
>   private void assertSame(String regexp) throws IOException {
> +    try {
> +      new RegExp(regexp);
> +    } catch (IllegalArgumentException iae) {
> +      // the random regexp could be malformed, eg "foo"bar",
> +      // so we ignore this
> +      return;
> +    }
> +
>     RegexpQuery smart = new RegexpQuery(new Term("field", regexp));
>     DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field", regexp));
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com