You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stdcxx.apache.org by Martin Sebor <se...@roguewave.com> on 2008/03/27 15:51:44 UTC

Re: [Stdcxx Wiki] Update of "LocaleLookup" by TravisVitek

Apache Wiki wrote:
> Dear Wiki user,
> 
> You have subscribed to a wiki page or wiki category on "Stdcxx Wiki" for change notification.
> 
> The following page has been changed by TravisVitek:
> http://wiki.apache.org/stdcxx/LocaleLookup
> 
> ------------------------------------------------------------------------------
>   
>   The objective of this project is to provide an interface to make it easy to write localization tests without the knowledge of platform-specific details (such as locale names) that provide sufficient code coverage and that complete in a reasonable amount of time (ideally seconds as opposed to minutes). The interface must make it easy to query the system for locales that satisfy the specific requirements of each test. For example, most tests that currently use all installed locales (e.g., the set of tests for the `std::ctype` facet) only need to exercise a representative sample of the installed locales without using the same locale more than once. Thus the interface will need to make it possible to specify such a sample. Another example is tests that attempt to exercise locales in multibyte encodings whose `MB_CUR_MAX` ranges from 1 to 6 (some of the `std::codecvt` facet tests). The new interface will need to make it easy to specify such a set of locales without explicitly 
na
>  ming them, and it will need to retrieve such locales without returning duplicates.
>   
> + [[Anchor(UseCases)]]
> + == Use Cases ==
> + 
> + The existing locale tests select locales based on a few different criteria. Below is a list of locales tests and the criteria used for locale selection within those tests.
> + 

Look at you: you've become a regular wiki formatting artist! ;-)

> + || Test || Criteria ||

Great breakdown. I assume the Criteria are the current conditions.
We should also try to come up with the ideal conditions that we'll
want to implement using the new API.

> + || 22.LOCALE.CODECVT.MT.CPP || *1,+ ||

Any particular reason these names are all caps? :)

> + || 22.LOCALE.CODECVT.OUT.CPP || *2 ||

IIRC, this test tries to exercise all values of MB_CUR_MAX, not
just the largest.

> + || 22.LOCALE.CONS.MT.CPP || *1,+ ||
> + || 22.LOCALE.CTYPE.CPP || *2 ||
> + || 22.LOCALE.CTYPE.IS.CPP || *2 ||
> + || 22.LOCALE.CTYPE.MT.CPP || *1,+ ||
> + || 22.LOCALE.CTYPE.NARROW.CPP || *2 ||
> + || 22.LOCALE.CTYPE.SCAN.CPP || *2 ||
> + || 22.LOCALE.CTYPE.TOLOWER.CPP || *2 ||
> + || 22.LOCALE.CTYPE.TOUPPER.CPP || *2 ||

I thought the ctype tests were being run in all installed locales,
just like the numpunct one? Which is what we want to move away from.
IMO, exercising a small set (less than a dozen) of known locales and
encodings should be plenty.

> + || 22.LOCALE.GLOBALS.MT.CPP || *8,+ ||
> + || 22.LOCALE.MESSAGES.CPP || *7 ||
> + || 22.LOCALE.MONEY.GET.MT.CPP || *1,+ ||
> + || 22.LOCALE.MONEY.PUT.MT.CPP || *1,+ ||
> + || 22.LOCALE.MONEYPUNCT.CPP || *4 ||
> + || 22.LOCALE.MONEYPUNCT.MT.CPP || *1,+ ||
> + || 22.LOCALE.NUM.GET.CPP || *9 ||
> + || 22.LOCALE.NUM.GET.MT.CPP || *1,+ ||
> + || 22.LOCALE.NUM.PUT.CPP || *9 ||
> + || 22.LOCALE.NUM.PUT.MT.CPP || *1,+ ||
> + || 22.LOCALE.NUMPUNCT.MT.CPP || *1,+ ||
> + || 22.LOCALE.STATICS.MT.CPP || *4,+ ||
> + || 22.LOCALE.TIME.GET.CPP || *5,6 ||

I think in the time tests we look for a locale using a specific
language (e.g., Danish, English, German). The encoding doesn't
really matter, unless we throw in some Asian language as well.
Grepping for the locale using a regular expression was the best
we could do using the old interface. I suppose we'll still be
using a regular expression with the new API, unless you have
support for just language names (e.g., "da, de, en").

Martin

Re: [Stdcxx Wiki] Update of "LocaleLookup" by TravisVitek

Posted by Martin Sebor <se...@roguewave.com>.
Travis Vitek wrote:
>  
> 
> Travis Vitek wrote:
>> Martin Sebor wrote:
>>
>>> Travis Vitek wrote:
>>>
>>>> + || 22.LOCALE.CONS.MT.CPP || *1,+ ||
>>>> + || 22.LOCALE.CTYPE.CPP || *2 ||
>>>> + || 22.LOCALE.CTYPE.IS.CPP || *2 ||
>>>> + || 22.LOCALE.CTYPE.MT.CPP || *1,+ ||
>>>> + || 22.LOCALE.CTYPE.NARROW.CPP || *2 ||
>>>> + || 22.LOCALE.CTYPE.SCAN.CPP || *2 ||
>>>> + || 22.LOCALE.CTYPE.TOLOWER.CPP || *2 ||
>>>> + || 22.LOCALE.CTYPE.TOUPPER.CPP || *2 ||
>>> I thought the ctype tests were being run in all installed locales,
>>> just like the numpunct one? Which is what we want to move away from.
>>> IMO, exercising a small set (less than a dozen) of known locales and
>>> encodings should be plenty.
>>>
>> Yes, the non-mt ctype tests iterate over each locale for which 
>> the function call `setlocale (LC_CTYPE, name)' succeeds. The mt
>> ctype tests all limit the number of tested locales to 32.
>>
> 
> Any suggestions on which languages/countries/codesets that we should
> be testing against for the ctype tests?

I think we should cover a few Western locales and a few Asian ones.
For the first group, here are some candidates: one of each of en_US,
de_*, fr_*, es_*, in a mix of ISO-8859 and UTF-8. For the second
group, I'd consider one of each of ja_JP, ru_*, zh_* in EUC-JP,
Shift_JIS, KOI*, GB*, and UTF-8.

> 
> Reducing the number of selected locales to 32 is pretty easy. Selecting
> which locales is a little more difficult.

You're telling me! :)

> Another issue is that the
> mechanism I have defined doesn't support selecting only one locale for
> each match.

So there's no way to ask for just one of locale out of the three
here: ja_JP.{EUC-JP,Shift_JIS,UTF-8} That may not be too much of
a problem unless each of the expansions matches multiple aliases
of the same locale. Will see how it goes as we come up with query
strings for each test.

Martin

RE: [Stdcxx Wiki] Update of "LocaleLookup" by TravisVitek

Posted by Travis Vitek <Tr...@roguewave.com>.
 

Travis Vitek wrote:
>
>Martin Sebor wrote:
>
>>Travis Vitek wrote:
>>
>>> + || 22.LOCALE.CONS.MT.CPP || *1,+ ||
>>> + || 22.LOCALE.CTYPE.CPP || *2 ||
>>> + || 22.LOCALE.CTYPE.IS.CPP || *2 ||
>>> + || 22.LOCALE.CTYPE.MT.CPP || *1,+ ||
>>> + || 22.LOCALE.CTYPE.NARROW.CPP || *2 ||
>>> + || 22.LOCALE.CTYPE.SCAN.CPP || *2 ||
>>> + || 22.LOCALE.CTYPE.TOLOWER.CPP || *2 ||
>>> + || 22.LOCALE.CTYPE.TOUPPER.CPP || *2 ||
>>
>>I thought the ctype tests were being run in all installed locales,
>>just like the numpunct one? Which is what we want to move away from.
>>IMO, exercising a small set (less than a dozen) of known locales and
>>encodings should be plenty.
>>
>
>Yes, the non-mt ctype tests iterate over each locale for which 
>the function call `setlocale (LC_CTYPE, name)' succeeds. The mt
>ctype tests all limit the number of tested locales to 32.
>

Any suggestions on which languages/countries/codesets that we should
be testing against for the ctype tests?

Reducing the number of selected locales to 32 is pretty easy. Selecting
which locales is a little more difficult. Another issue is that the
mechanism I have defined doesn't support selecting only one locale for
each match.

Travis

RE: [Stdcxx Wiki] Update of "LocaleLookup" by TravisVitek

Posted by Travis Vitek <Tr...@roguewave.com>.
>
>Look at you: you've become a regular wiki formatting artist! ;-)
>

Yup. I'm getting used to clicking through the Wiki formatting guide.

>> + || Test || Criteria ||
>
>Great breakdown. I assume the Criteria are the current conditions.
>We should also try to come up with the ideal conditions that we'll
>want to implement using the new API.

Yeah, that is the hard part.

>> + || 22.LOCALE.CODECVT.MT.CPP || *1,+ ||
>
>Any particular reason these names are all caps? :)
>

That is just the format of the output that I got when gathering the names
of all tests that use `rw_locales()'

>> + || 22.LOCALE.CODECVT.OUT.CPP || *2 ||
>
>IIRC, this test tries to exercise all values of MB_CUR_MAX, not
>just the largest.
>

Nope. It calls a function named `find_mb_locale()' which looks through the
list of installed locales for the one that has the largest return from the
function `get_mb_chars()'. The comment from `find_mb_locale()' says...

   finds the multibyte locale with the largest MB_CUR_MAX value and
   fills consecutive elemenets of the `mb_chars' array with multibyte
   characters between 1 and MB_CUR_MAX bytes long for such a locale

>> + || 22.LOCALE.CONS.MT.CPP || *1,+ ||
>> + || 22.LOCALE.CTYPE.CPP || *2 ||
>> + || 22.LOCALE.CTYPE.IS.CPP || *2 ||
>> + || 22.LOCALE.CTYPE.MT.CPP || *1,+ ||
>> + || 22.LOCALE.CTYPE.NARROW.CPP || *2 ||
>> + || 22.LOCALE.CTYPE.SCAN.CPP || *2 ||
>> + || 22.LOCALE.CTYPE.TOLOWER.CPP || *2 ||
>> + || 22.LOCALE.CTYPE.TOUPPER.CPP || *2 ||
>
>I thought the ctype tests were being run in all installed locales,
>just like the numpunct one? Which is what we want to move away from.
>IMO, exercising a small set (less than a dozen) of known locales and
>encodings should be plenty.
>

Yes, the non-mt ctype tests iterate over each locale for which the function
call `setlocale (LC_CTYPE, name)' succeeds. The mt ctype tests all limit
the number of tested locales to 32.

>> + || 22.LOCALE.TIME.GET.CPP || *5,6 ||
>
>I think in the time tests we look for a locale using a specific
>language (e.g., Danish, English, German). The encoding doesn't
>really matter, unless we throw in some Asian language as well.
>Grepping for the locale using a regular expression was the best
>we could do using the old interface. I suppose we'll still be
>using a regular expression with the new API, unless you have
>support for just language names (e.g., "da, de, en").
>

Yes. Using the grammar from the Wiki page, you can get the name of a single
Danish language locale for any country, codeset or MB_CUR_LEN value using
the following call...

  const char* name = rw_locale_query (LC_TIME, "da-*-*-*", 1);

If, as an enhancement, we wanted to allow the user to provide just the
language name, or language and country, we could probably do that.

Travis