You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2016/11/16 17:07:58 UTC

[jira] [Updated] (LUCENE-7540) Upgrade ICU to 58.1

     [ https://issues.apache.org/jira/browse/LUCENE-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-7540:
---------------------------------------
    Attachment: LUCENE-7540.patch

I attempted to upgrade to ICU 58.1 (see attached patch), and ran {{ant regenerate}}, but our evil {{checkRandomData}} test is tripping assertions in ICU's {{RuleBasedBreakIterator.java}}:

{noformat}
   [junit4]   2> ??? 16, 2016 6:56:39 ? com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
   [junit4]   2> WARNING: Uncaught exception in thread: Thread[Thread-3,5,TGRP-TestICUTokenizer]
   [junit4]   2> java.lang.AssertionError
   [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([34D64859D1A7CD98]:0)
   [junit4]   2> 	at com.ibm.icu.text.RuleBasedBreakIterator.checkDictionary(RuleBasedBreakIterator.java:544)
   [junit4]   2> 	at com.ibm.icu.text.RuleBasedBreakIterator.next(RuleBasedBreakIterator.java:428)
   [junit4]   2> 	at org.apache.lucene.analysis.icu.segmentation.BreakIteratorWrapper$RBBIWrapper.next(BreakIteratorWrapper.java:96)
   [junit4]   2> 	at org.apache.lucene.analysis.icu.segmentation.CompositeBreakIterator.next(CompositeBreakIterator.java:65)
   [junit4]   2> 	at org.apache.lucene.analysis.icu.segmentation.ICUTokenizer.incrementTokenBuffer(ICUTokenizer.java:210)
   [junit4]   2> 	at org.apache.lucene.analysis.icu.segmentation.ICUTokenizer.incrementToken(ICUTokenizer.java:104)
   [junit4]   2> 	at org.apache.lucene.analysis.icu.ICUNormalizer2Filter.incrementToken(ICUNormalizer2Filter.java:80)
   [junit4]   2> 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:183)
   [junit4]   2> 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:301)
   [junit4]   2> 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:305)
   [junit4]   2> 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:829)
   [junit4]   2> 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:628)
   [junit4]   2> 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.access$000(BaseTokenStreamTestCase.java:61)
   [junit4]   2> 	at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:496)
   [junit4]   2> 
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestICUTokenizer -Dtests.method=testRandomHugeStrings -Dtests.seed=34D64859D1A7CD98 -Dtests.locale=ar-QA -Dtests.timezone=Africa/Bujumbura -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
{noformat}

I had previously installed icu4c 58.1 from sources, and installed it on my dev box so its generation tools (e.g. {{gennorm2}}) are available ... so maybe I messed something up in that process, or maybe this is an ICU bug?

> Upgrade ICU to 58.1
> -------------------
>
>                 Key: LUCENE-7540
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7540
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: master (7.0), 6.4
>
>         Attachments: LUCENE-7540.patch
>
>
> ICU is up to 58.1, but our ICU analysis components currently use 56.1, which is ~1 year old by now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org