You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2012/05/26 21:23:23 UTC

[jira] [Created] (LUCENE-4078) PatternReplaceCharFilter assertion error

Dawid Weiss created LUCENE-4078:
-----------------------------------

             Summary: PatternReplaceCharFilter assertion error
                 Key: LUCENE-4078
                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
             Project: Lucene - Java
          Issue Type: Bug
            Reporter: Dawid Weiss
            Assignee: Dawid Weiss
            Priority: Minor
             Fix For: 4.0


Build: https://builds.apache.org/job/Lucene-trunk/1942/

1 tests failed.
REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings

Error Message:


Stack Trace:
java.lang.AssertionError
       at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
       at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
       at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
       at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
       at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
       at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
       at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
       at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:616)
       at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
       at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
       at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
       at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
       at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
       at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
       at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
       at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
       at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
       at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
       at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
       at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
       at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
We could add a check at the random pattern level generation which
would apply the pattern to a complex unicode string and then verify
it's valid utf16 afterwards. If it's not, the pattern would be picked
again?

Dawid

On Sun, May 27, 2012 at 2:49 PM, Robert Muir <rc...@gmail.com> wrote:
> there is another situation, with the html stripper where we turn this assert
> into an assume. I don't have the code in front of me, but I think it would
> be good to just add this as a toggle to mocktokenizer (throw
> assumptionviolated in this case), if that's not how it works already.
>
> On May 27, 2012 4:51 AM, "Dawid Weiss (JIRA)" <ji...@apache.org> wrote:
>>
>>
>>    [
>> https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284124#comment-13284124
>> ]
>>
>> Dawid Weiss commented on LUCENE-4078:
>> -------------------------------------
>>
>> Thanks for the insight -- interesting.
>>
>> bq. is tangential to the problem – which seems to be that the JVM lets the
>> empty pattern split in between chars instead of codepoints, which seems like
>> a bug.
>>
>> Absolutely. This seems like a bug to me too.
>>
>> > PatternReplaceCharFilter assertion error
>> > ----------------------------------------
>> >
>> >                 Key: LUCENE-4078
>> >                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>> >             Project: Lucene - Java
>> >          Issue Type: Bug
>> >            Reporter: Dawid Weiss
>> >            Assignee: Dawid Weiss
>> >            Priority: Minor
>> >             Fix For: 4.0
>> >
>> >
>> > Build: https://builds.apache.org/job/Lucene-trunk/1942/
>> > 1 tests failed.
>> > REGRESSION:
>> >  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
>> > Error Message:
>> > Stack Trace:
>> > java.lang.AssertionError
>> >        at
>> > __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>> >        at
>> > org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>> >        at
>> > org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>> >        at
>> > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>> >        at
>> > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>> >        at
>> > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>> >        at
>> > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>> >        at
>> > org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >        at
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> >        at
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >        at java.lang.reflect.Method.invoke(Method.java:616)
>> >        at
>> > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>> >        at
>> > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>> >        at
>> > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>> >        at
>> > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>> >        at
>> > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>> >        at
>> > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>> >        at
>> > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>> >        at
>> > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>> >        at
>> > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>> >        at
>> > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>> >        at
>> > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>> >        at
>> > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>> >        at
>> > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators:
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by Robert Muir <rc...@gmail.com>.
there is another situation, with the html stripper where we turn this
assert into an assume. I don't have the code in front of me, but I think it
would be good to just add this as a toggle to mocktokenizer (throw
assumptionviolated in this case), if that's not how it works already.
On May 27, 2012 4:51 AM, "Dawid Weiss (JIRA)" <ji...@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284124#comment-13284124]
>
> Dawid Weiss commented on LUCENE-4078:
> -------------------------------------
>
> Thanks for the insight -- interesting.
>
> bq. is tangential to the problem – which seems to be that the JVM lets the
> empty pattern split in between chars instead of codepoints, which seems
> like a bug.
>
> Absolutely. This seems like a bug to me too.
>
> > PatternReplaceCharFilter assertion error
> > ----------------------------------------
> >
> >                 Key: LUCENE-4078
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >            Reporter: Dawid Weiss
> >            Assignee: Dawid Weiss
> >            Priority: Minor
> >             Fix For: 4.0
> >
> >
> > Build: https://builds.apache.org/job/Lucene-trunk/1942/
> > 1 tests failed.
> > REGRESSION:
>  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> > Error Message:
> > Stack Trace:
> > java.lang.AssertionError
> >        at
> __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
> >        at
> org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
> >        at
> org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
> >        at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
> >        at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
> >        at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
> >        at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
> >        at
> org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >        at java.lang.reflect.Method.invoke(Method.java:616)
> >        at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
> >        at
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
> >        at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
> >        at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
> >        at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
> >        at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> >        at
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
> >        at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> >        at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> >        at
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
> >        at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> >        at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> >        at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

[jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284087#comment-13284087 ] 

Dawid Weiss commented on LUCENE-4078:
-------------------------------------

bq. I'm not really following you there ... '|' is the OR operator, so the regex "|" is a redundant way of saying "" which is "the empty pattern" or a way of saying "match the empty string".

Yeah, I am a bit surprised at what "" matches. It doesn't match an empty string. It matches an empty string in between characters... or in other words, it matches what's not there. Makes sense when you think of it.

As for '|', I looked at it from automata theory point of view -- '|' doesn't need any arguments or post-arguments (or states), unlike '+', '*' or the like which need a state to reference. I'd be convinced '|' is a consistent way of saying 'match empty string or empty string' if "+" pattern worked ("match empty string one or more times"), but it doesn't -- this will fail with an error. So '|' is kind of special here.

I don't know much about regexp theory to argue if I'm right or wrong though. I don't even think there is one "right" way to do things if this is a true quote:

I define UNIX as “30 definitions of regular expressions living under one roof.” —Don Knuth

Dawid
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated LUCENE-4078:
--------------------------------

    Attachment: LUCENE-4078.patch

This doesn't filter out all invalid possibilities but should protect against some (including the one that caused this issue)?
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: LUCENE-4078.patch
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284080#comment-13284080 ] 

Hoss Man commented on LUCENE-4078:
----------------------------------

bq. When you think of it '|' is not an operator ...

I'm not really following you there ... '|' is the OR operator, so the regex "|" is a redundant way of saying "" which is "the empty pattern" or a way of saying "match the empty string".  

Consider in particular when the regex is used for a replace or split type operation: "z|" means "match z or the empty string"...

{code}
$ perl -MData::Dumper -le 'print Dumper split /z|/, "ABzCD";'
$VAR1 = 'A';
$VAR2 = 'B';
$VAR3 = 'C';
$VAR4 = 'D';
{code}

You should see similar results in java with if you use Matcher.find() as an iterator.

As for the original pattern: "]|" -- that's just a convince form of "\]|" .. an unescaped close bracket (that has no matching open bracket) is treated as a literal...

{code}
$ perl -MData::Dumper -le 'print Dumper split /]|/, "AB]CD";'
$VAR1 = 'A';
$VAR2 = 'B';
$VAR3 = 'C';
$VAR4 = 'D';
{code}

What i can't explain, is why java treats "empty string" as something that matches in the middle of a code point.  that certainly sounds like bug, unless there is some subtlety in Unicode TR#18 that i'm not seeing...

http://www.unicode.org/reports/tr18/
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284113#comment-13284113 ] 

Hoss Man commented on LUCENE-4078:
----------------------------------

bq. It doesn't match an empty string. It matches an empty string in between characters...

Well, it's more complicated then that.  it *does* match the empty string (in the sense of "does this regex match this entire string which happens to be empty) but in the context of "find" or "replace" on a larger string you are correct that it matches nothing, which means it matches the emptiness between characters.

bq. I'd be convinced '|' is a consistent way of saying 'match empty string or empty string' if "+" pattern worked ("match empty string one or more times"), but it doesn't -- this will fail with an error. So '|' is kind of special here.

I think that's just a fluke of syntax/precedence ... if you use parens (capturing or otherwise) you can say "match the empty pattern 1 or more times)...

{code}
$ perl -MData::Dumper -le 'print Dumper split /(?:)+/, "ABCD";'
$VAR1 = 'A';
$VAR2 = 'B';
$VAR3 = 'C';
$VAR4 = 'D';
{code}

Bottom Line: these patterns are all valid and meaningful, and everything we've discussed is tangential to the problem -- which seems to be that the JVM lets the empty pattern split in between chars instead of codepoints, which seems like a bug.
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284075#comment-13284075 ] 

Dawid Weiss commented on LUCENE-4078:
-------------------------------------

More insanity:
{code}
    String s1 = "AB\uD840\uDC00C";
    String s2 = s1.replaceAll("", "xyz");
{code}
{code}
    String s1 = "AB\uD840\uDC00C";
    String s2 = s1.replaceAll("|||||||||||||||||||||||||||||||", "xyz");
{code}
These are equivalent. :)

When you think of it '|' is not an operator, it is an alternative that simply splits (the automaton) into two conditional paths. Since preconditions and post-conditions are empty, these paths converge into an empty state. Which matches on every character. Which seems to be a bug because it matches on every char[] instead of codepoint ('.' matches each codepoint).

Don't know what to make of it.
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284193#comment-13284193 ] 

Dawid Weiss commented on LUCENE-4078:
-------------------------------------

I checked python3 and it works according to the intuition (works on codepoints). I also asked on openjdk's i18n mailing list about where to file it.

In the mean time I'll fix it by looping the random pattern generator until it processes a non-bmp character so that the returned string is valid utf16.
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284124#comment-13284124 ] 

Dawid Weiss commented on LUCENE-4078:
-------------------------------------

Thanks for the insight -- interesting.

bq. is tangential to the problem – which seems to be that the JVM lets the empty pattern split in between chars instead of codepoints, which seems like a bug.

Absolutely. This seems like a bug to me too.
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss resolved LUCENE-4078.
---------------------------------

    Resolution: Incomplete

This is very likely a JDK bug. I committed a patch that works around some of the trivial random patterns but it's definitely not a full resolution of the problem.
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: LUCENE-4078.patch
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284080#comment-13284080 ] 

Hoss Man edited comment on LUCENE-4078 at 5/26/12 10:27 PM:
------------------------------------------------------------

bq. When you think of it '|' is not an operator ...

I'm not really following you there ... '|' is the OR operator, so the regex "|" is a redundant way of saying "" which is "the empty pattern" or a way of saying "match the empty string".  

Consider in particular when the regex is used for a replace or split type operation: "z|" means "match z or the empty string"...

{code}
$ perl -MData::Dumper -le 'print Dumper split /z|/, "ABzCD";'
$VAR1 = 'A';
$VAR2 = 'B';
$VAR3 = 'C';
$VAR4 = 'D';
{code}

You should see similar results in java with if you use Matcher.find() as an iterator.

As for the original pattern: "]|" -- that's just a convince form of (imposible to write this in jira markup w/o code tags)...
{code}\]|{code}
 .. an unescaped close bracket (that has no matching open bracket) is treated as a literal...

{code}
$ perl -MData::Dumper -le 'print Dumper split /]|/, "AB]CD";'
$VAR1 = 'A';
$VAR2 = 'B';
$VAR3 = 'C';
$VAR4 = 'D';
{code}

What i can't explain, is why java treats "empty string" as something that matches in the middle of a code point.  that certainly sounds like bug, unless there is some subtlety in Unicode TR#18 that i'm not seeing...

http://www.unicode.org/reports/tr18/
                
      was (Author: hossman):
    bq. When you think of it '|' is not an operator ...

I'm not really following you there ... '|' is the OR operator, so the regex "|" is a redundant way of saying "" which is "the empty pattern" or a way of saying "match the empty string".  

Consider in particular when the regex is used for a replace or split type operation: "z|" means "match z or the empty string"...

{code}
$ perl -MData::Dumper -le 'print Dumper split /z|/, "ABzCD";'
$VAR1 = 'A';
$VAR2 = 'B';
$VAR3 = 'C';
$VAR4 = 'D';
{code}

You should see similar results in java with if you use Matcher.find() as an iterator.

As for the original pattern: "]|" -- that's just a convince form of "\]|" .. an unescaped close bracket (that has no matching open bracket) is treated as a literal...

{code}
$ perl -MData::Dumper -le 'print Dumper split /]|/, "AB]CD";'
$VAR1 = 'A';
$VAR2 = 'B';
$VAR3 = 'C';
$VAR4 = 'D';
{code}

What i can't explain, is why java treats "empty string" as something that matches in the middle of a code point.  that certainly sounds like bug, unless there is some subtlety in Unicode TR#18 that i'm not seeing...

http://www.unicode.org/reports/tr18/
                  
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291996#comment-13291996 ] 

Dawid Weiss commented on LUCENE-4078:
-------------------------------------

Follow-up discussion on core-libs-dev. The bottom line: this is the expected behavior... 

http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-June/010455.html
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: LUCENE-4078.patch
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284072#comment-13284072 ] 

Dawid Weiss commented on LUCENE-4078:
-------------------------------------

I just checked perl... these do seem to be valid regexps! (they work in perl...) I don't know how to interpret them though. 
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284070#comment-13284070 ] 

Dawid Weiss commented on LUCENE-4078:
-------------------------------------

This bug is caused by Robert's evil random Pattern instances. I think it qualifies as a bug in the JDK's implementation... but then, I'm not sure. Here it is -- the Pattern:
{code}
Pattern p = Pattern.compile("]|");
{code}
For those of you who wondered, this also is a valid pattern:
{code}
Pattern p = Pattern.compile("|");
{code}
What should these match? I have no clue. Are they even valid? I have no clue. Anyway, what happens is that surrogate pairs (and everything else) is divided, so:
{code}
    String s1 = "AB\uD840\uDC00C";
    String s2 = s1.replaceAll("]|", "xyz");
    System.out.println(s1);
    System.out.println(s2);
{code}
results in:
{noformat}
AB𠀀C
xyzAxyzBxyz?xyz?xyzCxyz
{noformat}
and an assertion is righteously thrown saying a surrogate pair is broken.

Don't know what to do with this. Robert?
                
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>
>                 Key: LUCENE-4078
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4078
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
>
>
> Build: https://builds.apache.org/job/Lucene-trunk/1942/
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:153)
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:123)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:558)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:488)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:430)
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:424)
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:323)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org