You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Uwe Schindler <us...@apache.org> on 2016/03/26 20:11:11 UTC

RE: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

Hi Alan, hi Robert, Hi Lucene developers,

I was able to reproduce the bug in isolation. The reason why Robert and you did not see it was quite simple:
- You need to enable a security manager
- You need to list all locales before

When you print the class name of the returned break iterator, with Java 8 or Java 9 b110 it returns: "class sun.util.locale.provider.DictionaryBasedBreakIterator"
With build 111 and no security manager, it prints: "class sun.util.locale.provider.DictionaryBasedBreakIterator" (all fine).
With build 111 and security manager enabled, it prints: "class sun.util.locale.provider.RuleBasedBreakIterator" (which is the wrong one for Thai).

Here is my test code:

import java.text.BreakIterator;
import java.util.*;
 
public class Test {
  public static void main(String... args) throws Exception {
    String[] availableLanguageTags = Arrays.stream(Locale.getAvailableLocales())
      .map(Locale::toLanguageTag)
      .sorted()
      .distinct()
      .toArray(String[]::new);
    BreakIterator iterator = BreakIterator.getWordInstance(new Locale("th"));
    System.out.println(iterator.getClass());
  }
}

The availableLanguageTags is the code our test framework does before running a test. This is needed to trigger the bug.

The other problem around Farsi is the same: If you run without a security manager all passes. With security manager it fails. The reason is the same: The Collator returned is just a default Collator, not the one for Arabic/Farsi text.

So it looks like the initialization code for locales misses to do some doPrivileged() somewhere. Maybe that one was lost during the merge.

Uwe

-----
Uwe Schindler
uschindler@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/


> -----Original Message-----
> From: Alan Bateman [mailto:Alan.Bateman@oracle.com]
> Sent: Saturday, March 26, 2016 3:10 PM
> To: Uwe Schindler <us...@apache.org>
> Cc: 'Rory O'Donnell' <ro...@oracle.com>; 'Core-Libs-Dev' <core-libs-
> dev@openjdk.java.net>; 'Robert Muir' <rc...@gmail.com>
> Subject: Re: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail
> with Farsi and Thai language
> 
> On 26/03/2016 11:56, Uwe Schindler wrote:
> > Hi,
> >
> > after also testing the separate "Jigsaw" build on jdk9.java.net I see the
> same problems. So both builds 111 are wrong.
> >
> > To me it looks like the Unicode data files are missing some information -
> which could again be a packaging bug. As said before, build 110 does not have
> this problem, so it seems to be a side-effect of Jigsaw merging.
> >
> > The following stuff does not work:
> >
> > (1) Thai's locale does not have working dictionary-based BreakIterator
> available. The following "check" in Lucene for this fails, because it cannot
> detect a boundary correctly:
> >
> >    /**
> >     * True if the JRE supports a working dictionary-based breakiterator for
> Thai.
> >     * If this is false, this tokenizer will not work at all!
> >     */
> >    public static final boolean DBBI_AVAILABLE;
> >    private static final BreakIterator proto =
> BreakIterator.getWordInstance(new Locale("th"));
> >    static {
> >      // check that we have a working dictionary-based break iterator for thai
> >      proto.setText("ภาษาไทย");
> >      DBBI_AVAILABLE = proto.isBoundary(4);
> >    }
> >
> > After this static initializer, DBBI_AVAILABLE is false. This makes some tests
> to be ignored, but 2 fail because of this (which might be an oversight on our
> side). But nevertheless, this is a bug in build 111.
> I just tried to duplicate this on OSX and Linux without success. The log
> you linked to suggests this is Linux, is that right? Is this the JDK
> bundle, I haven't checked the JRE bundle but would be surprise anything
> is missing. The JDK has several tests for Thai so if it was completely
> broken then I would have expected it would have been seen. I've no doubt
> that it is not working in your environment, we just need to figure out
> what is different.
> 
> >
> > (2) The collator for Arabic (Farsi) language fails to work correctly. This also
> looks like missing data.
> >
> > Collator collator = Collator.getInstance(new Locale("ar"));
> >
> Are there any exceptions or anything here? Or maybe it tests the
> collector with compare?
> 
> -Alan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

Posted by Uwe Schindler <us...@apache.org>.
Hi Alan,

 

I can confirm, the problem is fixed in build 113. Thanks!

 

Uwe

 

-----

Uwe Schindler

uschindler@apache.org 

ASF Member, Apache Lucene PMC / Committer

Bremen, Germany

http://lucene.apache.org/

 

From: Alan Bateman [mailto:Alan.Bateman@oracle.com] 
Sent: Sunday, March 27, 2016 11:50 AM
To: Uwe Schindler <us...@apache.org>
Cc: 'Rory O'Donnell' <ro...@oracle.com>; 'Core-Libs-Dev' <co...@openjdk.java.net>; 'Robert Muir' <rc...@gmail.com>; dev@lucene.apache.org
Subject: Re: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

 

On 26/03/2016 23:16, Uwe Schindler wrote:



: 

 

Now all analysis tests pass, also the small test program posted previously:

 

$ java -Djava.security.manager Test

class sun.util.locale.provider.DictionaryBasedBreakIterator

 

FYI: Lucene runs all tests with a security manager to enforce some restrictions (so tests can't escape their working dir, no useless permissions are required that may conflict,...). Lucene is designed to work with lowest permissions (except the memory mapping unmapper).

 

I will patch the Jenkins Server's JDK-9 b111 dirs the same way, so we can run tests.

 

I have the following question: Why don’t we see an exception when loading the locale data? Shouln’t Java fail in some way and print a stack trace? It is just silent!

 

Thanks for the confirming the workaround. I've created JDK-8152817 to track this issue and I hope we can get this fixed in jdk9/dev quickly.

The reason you aren't seeing the exception is that an internal class sun.util.Resources.loadBundleFromProviders is catching (and recording) the security exception. The cause may surface as the cause on a MissingResourceException but not in this case. This area has a lot of history, I'm sure Naoto can say more about this.

-Alan


RE: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

Posted by Uwe Schindler <us...@apache.org>.
Thanks Alan.

 

I can confirm that Jenkins now runs Lucene/Solr tests fine. We had many runs last night, all succeeded – great!

 

About the new issue: Thanks for opening.

 

One addition: This not only affects BreakIterators, also the Arabian Collator as found in the other failing test! It reproduces in same way:

 

Collator collator = Collator.getInstance(new Locale(“ar”));

 

the returned one is the one from root locale (or something similar), not the Arabian one. We found the bug by “testing” the sort order which failed – because its different than default ROOT sorting. But you can also look at class names like I did for the BreakIterators.

 

I wish you happy Easter!

 

Uwe

 

-----

Uwe Schindler

uschindler@apache.org 

ASF Member, Apache Lucene PMC / Committer

Bremen, Germany

http://lucene.apache.org/

 

From: Alan Bateman [mailto:Alan.Bateman@oracle.com] 
Sent: Sunday, March 27, 2016 11:50 AM
To: Uwe Schindler <us...@apache.org>
Cc: 'Rory O'Donnell' <ro...@oracle.com>; 'Core-Libs-Dev' <co...@openjdk.java.net>; 'Robert Muir' <rc...@gmail.com>; dev@lucene.apache.org
Subject: Re: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

 

On 26/03/2016 23:16, Uwe Schindler wrote:



: 

 

Now all analysis tests pass, also the small test program posted previously:

 

$ java -Djava.security.manager Test

class sun.util.locale.provider.DictionaryBasedBreakIterator

 

FYI: Lucene runs all tests with a security manager to enforce some restrictions (so tests can't escape their working dir, no useless permissions are required that may conflict,...). Lucene is designed to work with lowest permissions (except the memory mapping unmapper).

 

I will patch the Jenkins Server's JDK-9 b111 dirs the same way, so we can run tests.

 

I have the following question: Why don’t we see an exception when loading the locale data? Shouln’t Java fail in some way and print a stack trace? It is just silent!

 

Thanks for the confirming the workaround. I've created JDK-8152817 to track this issue and I hope we can get this fixed in jdk9/dev quickly.

The reason you aren't seeing the exception is that an internal class sun.util.Resources.loadBundleFromProviders is catching (and recording) the security exception. The cause may surface as the cause on a MissingResourceException but not in this case. This area has a lot of history, I'm sure Naoto can say more about this.

-Alan


Re: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

Posted by Alan Bateman <Al...@oracle.com>.
On 26/03/2016 23:16, Uwe Schindler wrote:
> :
>
> Now all analysis tests pass, also the small test program posted 
> previously:
>
> $ java -Djava.security.manager Test
>
> class sun.util.locale.provider.DictionaryBasedBreakIterator
>
> FYI: Lucene runs all tests with a security manager to enforce some 
> restrictions (so tests can't escape their working dir, no useless 
> permissions are required that may conflict,...). Lucene is designed to 
> work with lowest permissions (except the memory mapping unmapper).
>
> I will patch the Jenkins Server's JDK-9 b111 dirs the same way, so we 
> can run tests.
>
> I have the following question: Why don’t we see an exception when 
> loading the locale data? Shouln’t Java fail in some way and print a 
> stack trace? It is just silent!
>
>
Thanks for the confirming the workaround. I've created JDK-8152817 to 
track this issue and I hope we can get this fixed in jdk9/dev quickly.

The reason you aren't seeing the exception is that an internal class 
sun.util.Resources.loadBundleFromProviders is catching (and recording) 
the security exception. The cause may surface as the cause on a 
MissingResourceException but not in this case. This area has a lot of 
history, I'm sure Naoto can say more about this.

-Alan

RE: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

Posted by Uwe Schindler <us...@apache.org>.
Hi Alan,

 

I added the following to the jdk-9/conf/security/java.policy as first line under localedata:

 

grant codeBase "jrt:/jdk.localedata" {

        permission java.lang.RuntimePermission "getClassLoader";

        permission java.lang.RuntimePermission "accessClassInPackage.sun.text.*";

        permission java.lang.RuntimePermission "accessClassInPackage.sun.util.*";

        permission java.util.PropertyPermission "*", "read";

};

 

Now all analysis tests pass, also the small test program posted previously:

 

$ java -Djava.security.manager Test

class sun.util.locale.provider.DictionaryBasedBreakIterator

 

FYI: Lucene runs all tests with a security manager to enforce some restrictions (so tests can't escape their working dir, no useless permissions are required that may conflict,...). Lucene is designed to work with lowest permissions (except the memory mapping unmapper).

 

I will patch the Jenkins Server's JDK-9 b111 dirs the same way, so we can run tests.

 

I have the following question: Why don’t we see an exception when loading the locale data? Shouln’t Java fail in some way and print a stack trace? It is just silent!

 

Uwe

 

-----

Uwe Schindler

uschindler@apache.org 

ASF Member, Apache Lucene PMC / Committer

Bremen, Germany

http://lucene.apache.org/

 

 

> -----Original Message-----

> From: Alan Bateman [mailto:Alan.Bateman@oracle.com]

> Sent: Saturday, March 26, 2016 11:05 PM

> To: Uwe Schindler <us...@apache.org>

> Cc: 'Rory O'Donnell' <ro...@oracle.com>; 'Core-Libs-Dev' <core-libs-

> dev@openjdk.java.net>; 'Robert Muir' <rc...@gmail.com>;

> dev@lucene.apache.org

> Subject: Re: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail

> with Farsi and Thai language

> 

> 

> On 26/03/2016 19:11, Uwe Schindler wrote:

> > Hi Alan, hi Robert, Hi Lucene developers,

> >

> > I was able to reproduce the bug in isolation. The reason why Robert and

> you did not see it was quite simple:

> > - You need to enable a security manager

> > - You need to list all locales before

> >

> >

> Thanks, that's enough to understand the issue. There is code in

> ResourceBundleProviderSupport trying to do a privileged operation with

> less privileged on the stack.

> 

> As a temporary workaround, could you update the policy file

> (conf/security/java.policy) to grant an addition permission to

> jrt:/jdk.localedata

>    permission java.lang.RuntimePermission "getClassLoader";

> 

> I'll create a bug for the issue now.

> 

> -Alan.


Re: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

Posted by Alan Bateman <Al...@oracle.com>.
On 26/03/2016 19:11, Uwe Schindler wrote:
> Hi Alan, hi Robert, Hi Lucene developers,
>
> I was able to reproduce the bug in isolation. The reason why Robert and you did not see it was quite simple:
> - You need to enable a security manager
> - You need to list all locales before
>
>
Thanks, that's enough to understand the issue. There is code in 
ResourceBundleProviderSupport trying to do a privileged operation with 
less privileged on the stack.

As a temporary workaround, could you update the policy file 
(conf/security/java.policy) to grant an addition permission to 
jrt:/jdk.localedata
   permission java.lang.RuntimePermission "getClassLoader";

I'll create a bug for the issue now.

-Alan.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org