You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2016/11/16 17:07:58 UTC
[jira] [Updated] (LUCENE-7540) Upgrade ICU to 58.1
[ https://issues.apache.org/jira/browse/LUCENE-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-7540:
---------------------------------------
Attachment: LUCENE-7540.patch
I attempted to upgrade to ICU 58.1 (see attached patch), and ran {{ant regenerate}}, but our evil {{checkRandomData}} test is tripping assertions in ICU's {{RuleBasedBreakIterator.java}}:
{noformat}
[junit4] 2> ??? 16, 2016 6:56:39 ? com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
[junit4] 2> WARNING: Uncaught exception in thread: Thread[Thread-3,5,TGRP-TestICUTokenizer]
[junit4] 2> java.lang.AssertionError
[junit4] 2> at __randomizedtesting.SeedInfo.seed([34D64859D1A7CD98]:0)
[junit4] 2> at com.ibm.icu.text.RuleBasedBreakIterator.checkDictionary(RuleBasedBreakIterator.java:544)
[junit4] 2> at com.ibm.icu.text.RuleBasedBreakIterator.next(RuleBasedBreakIterator.java:428)
[junit4] 2> at org.apache.lucene.analysis.icu.segmentation.BreakIteratorWrapper$RBBIWrapper.next(BreakIteratorWrapper.java:96)
[junit4] 2> at org.apache.lucene.analysis.icu.segmentation.CompositeBreakIterator.next(CompositeBreakIterator.java:65)
[junit4] 2> at org.apache.lucene.analysis.icu.segmentation.ICUTokenizer.incrementTokenBuffer(ICUTokenizer.java:210)
[junit4] 2> at org.apache.lucene.analysis.icu.segmentation.ICUTokenizer.incrementToken(ICUTokenizer.java:104)
[junit4] 2> at org.apache.lucene.analysis.icu.ICUNormalizer2Filter.incrementToken(ICUNormalizer2Filter.java:80)
[junit4] 2> at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:183)
[junit4] 2> at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:301)
[junit4] 2> at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:305)
[junit4] 2> at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:829)
[junit4] 2> at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:628)
[junit4] 2> at org.apache.lucene.analysis.BaseTokenStreamTestCase.access$000(BaseTokenStreamTestCase.java:61)
[junit4] 2> at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:496)
[junit4] 2>
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestICUTokenizer -Dtests.method=testRandomHugeStrings -Dtests.seed=34D64859D1A7CD98 -Dtests.locale=ar-QA -Dtests.timezone=Africa/Bujumbura -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
{noformat}
I had previously installed icu4c 58.1 from sources, and installed it on my dev box so its generation tools (e.g. {{gennorm2}}) are available ... so maybe I messed something up in that process, or maybe this is an ICU bug?
> Upgrade ICU to 58.1
> -------------------
>
> Key: LUCENE-7540
> URL: https://issues.apache.org/jira/browse/LUCENE-7540
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7540.patch
>
>
> ICU is up to 58.1, but our ICU analysis components currently use 56.1, which is ~1 year old by now.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org