You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2019/12/04 19:32:03 UTC

More on "ant regenerate" target:

I have the git pull working for fetching a particular revision of nfkc.txt and the like. Now TestICUFoldingFilterFactory fails tests. Here's what I could find on that topic:

org.apache.lucene.analysis.icu.ICUFoldingFilter
  public static final Normalizer2 NORMALIZER = Normalizer2.getInstance(
    // TODO: if the wrong version of the ICU jar is used, loading these data files may give a strange error.
    // maybe add an explicit check? http://icu-project.org/apiref/icu4j/com/ibm/icu/util/VersionInfo.html
    ICUFoldingFilter.class.getResourceAsStream("utr30.nrm"),
    "utr30", Normalizer2.Mode.COMPOSE);
eventually calls: 

com.ibm.icu.impl.Normalizer2Impl
 public Normalizer2Impl load(ByteBuffer bytes) {
    try {
      this.dataVersion = ICUBinary.readHeaderAndDataVersion(bytes, 1316121906, IS_ACCEPTABLE);
which throws
Caused by: com.ibm.icu.util.ICUUncheckedIOException: java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 4.0.0.0

0x4e726d32==1316121906, so the data format looks ok to my uninformed eye.

The jar file I have for icu is: icu4j-62.1.jar

I looked at the nfc* files that are now fetched from github and at least ./lucene/analysis/icu/src/data/utr30/nfc.txt is identical.

I’ll get back to this later this afternoon, meanwhile any pointers?
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: More on "ant regenerate" target:

Posted by Robert Muir <rc...@gmail.com>.
IMO we should open an issue to make the regenerate task do some kind of
check and fail in a clear way if this happens. It would save people from
crazy debugging.

On Sat, Dec 7, 2019 at 11:14 AM Robert Muir <rc...@gmail.com> wrote:

> Hi Erick. sorry for the slow reply on this one. make sure you have correct
> icu4c version at the beginning of your PATH before running ant regenerate.
> it should match the icu4j version. it seems to me you have a mismatch.
>
> On Wed, Dec 4, 2019, 2:32 PM Erick Erickson <er...@gmail.com>
> wrote:
>
>> I have the git pull working for fetching a particular revision of
>> nfkc.txt and the like. Now TestICUFoldingFilterFactory fails tests. Here's
>> what I could find on that topic:
>>
>> org.apache.lucene.analysis.icu.ICUFoldingFilter
>>   public static final Normalizer2 NORMALIZER = Normalizer2.getInstance(
>>     // TODO: if the wrong version of the ICU jar is used, loading these
>> data files may give a strange error.
>>     // maybe add an explicit check?
>> http://icu-project.org/apiref/icu4j/com/ibm/icu/util/VersionInfo.html
>>     ICUFoldingFilter.class.getResourceAsStream("utr30.nrm"),
>>     "utr30", Normalizer2.Mode.COMPOSE);
>> eventually calls:
>>
>> com.ibm.icu.impl.Normalizer2Impl
>>  public Normalizer2Impl load(ByteBuffer bytes) {
>>     try {
>>       this.dataVersion = ICUBinary.readHeaderAndDataVersion(bytes,
>> 1316121906, IS_ACCEPTABLE);
>> which throws
>> Caused by: com.ibm.icu.util.ICUUncheckedIOException: java.io.IOException:
>> ICU data file error: Header authentication failed, please check if you have
>> a valid ICU data file; data format 4e726d32, format version 4.0.0.0
>>
>> 0x4e726d32==1316121906, so the data format looks ok to my uninformed eye.
>>
>> The jar file I have for icu is: icu4j-62.1.jar
>>
>> I looked at the nfc* files that are now fetched from github and at least
>> ./lucene/analysis/icu/src/data/utr30/nfc.txt is identical.
>>
>> I’ll get back to this later this afternoon, meanwhile any pointers?
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

Re: More on "ant regenerate" target:

Posted by Robert Muir <rc...@gmail.com>.
Hi Erick. sorry for the slow reply on this one. make sure you have correct
icu4c version at the beginning of your PATH before running ant regenerate.
it should match the icu4j version. it seems to me you have a mismatch.

On Wed, Dec 4, 2019, 2:32 PM Erick Erickson <er...@gmail.com> wrote:

> I have the git pull working for fetching a particular revision of nfkc.txt
> and the like. Now TestICUFoldingFilterFactory fails tests. Here's what I
> could find on that topic:
>
> org.apache.lucene.analysis.icu.ICUFoldingFilter
>   public static final Normalizer2 NORMALIZER = Normalizer2.getInstance(
>     // TODO: if the wrong version of the ICU jar is used, loading these
> data files may give a strange error.
>     // maybe add an explicit check?
> http://icu-project.org/apiref/icu4j/com/ibm/icu/util/VersionInfo.html
>     ICUFoldingFilter.class.getResourceAsStream("utr30.nrm"),
>     "utr30", Normalizer2.Mode.COMPOSE);
> eventually calls:
>
> com.ibm.icu.impl.Normalizer2Impl
>  public Normalizer2Impl load(ByteBuffer bytes) {
>     try {
>       this.dataVersion = ICUBinary.readHeaderAndDataVersion(bytes,
> 1316121906, IS_ACCEPTABLE);
> which throws
> Caused by: com.ibm.icu.util.ICUUncheckedIOException: java.io.IOException:
> ICU data file error: Header authentication failed, please check if you have
> a valid ICU data file; data format 4e726d32, format version 4.0.0.0
>
> 0x4e726d32==1316121906, so the data format looks ok to my uninformed eye.
>
> The jar file I have for icu is: icu4j-62.1.jar
>
> I looked at the nfc* files that are now fetched from github and at least
> ./lucene/analysis/icu/src/data/utr30/nfc.txt is identical.
>
> I’ll get back to this later this afternoon, meanwhile any pointers?
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>