You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stdcxx.apache.org by Travis Vitek <tv...@quovadx.com> on 2007/08/19 08:09:01 UTC

expectation vs requirements for locale facets

So I've got myself stuck a little bit, and I'm hoping to get some
direction. The 22.locale.time.get.mt test that I'm writing currently has
some problems. In the test I generate date/time strings like this...

    switch (data.type_) {
    case MyTimeData::get_time:
        *np.put (std::ostreambuf_iterator<char>(&nsb),
                 nio, ' ', &data.time_, 'X') = '\0';
        break;

In the test threads, I attempt to read the data back like this...

    switch (data.type_) {
    case MyTimeData::get_time:
        ng.get_time (std::istreambuf_iterator<char>(&nsb),
                     std::istreambuf_iterator<char>(), nio,
                     state, &local);
        RW_ASSERT (local.tm_hour == data.time_.tm_hour);
        RW_ASSERT (local.tm_min == data.time_.tm_min);
        RW_ASSERT (local.tm_sec == data.time_.tm_sec);
        break;

The problem is that some locales pad their date/time output with
whitespace [like '7. 6. 1988' or ' 7.6.1988'] and I'm unable to use
num_get<>::get_[time,date] to read what is written by num_put<>::put. It
is my understanding that I should be able to do so. Is this a bug, a
known issue, or is it acceptable behavior that I need to code around in
the test?

Travis 

Re: expectation vs requirements for locale facets

Posted by Martin Sebor <se...@roguewave.com>.
Travis Vitek wrote:
>  
> 
> Travis Vitek wrote:
>> Travis Vitek wrote:
>>> The problem is that some locales pad their date/time output with
>>> whitespace [like '7. 6. 1988' or ' 7.6.1988'] and I'm unable to use
>>> num_get<>::get_[time,date] to read what is written by 
>>> num_put<>::put. It
>>> is my understanding that I should be able to do so. Is this a bug, a
>>> known issue, or is it acceptable behavior that I need to code 
>> around in
>>> the test?
>> Whoops, obviously I am talking about the time_[get,put] facets here.
>>
> 
> Okay, so I've done the required research. These are the relevant
> sections of the standard that I was able to find.
> 
>   [22.2.5.1 p1]  Each  get  member parses  a  format  as  produced by a
>   corresponding format specifier to  time_put<>::put.  If  the sequence
>   being parsed maches  the correct format, the corresponding members of
>   the  struct tm  argument are  set  to  the values used to produce the
>   sequence; otherwise either an error is reported or unspecified values
>   are assigned.  
> 
>   [22.2.5.1.2 p2] Effects: Reads characters starting at  s until it has
>   extracted  those  struct tm members, and remaining format characters,
>   used by  time_put<>::put  to produce  the  format specified by 'X' or
>   until it encounters an error.
> 
> Unless I'm missing something obvious, it appears that the output of the
> time_put<> facet is required to be parseable by the time_get<> facet.

Yes. But notice the text doesn't say anything about time_put_byname or
time_get_byname ;-) The C++ standard (or even the C standard for that
matter) isn't going to of help here.

> Of
> course that isn't what I'm seeing.

Test case?

> 
> One case that fails is the weekday '%e'. With some locales '%x' expands
> out to '%m/%e/%Y'. Anyways, when putting the data,
> __rw_get_time_put_data() correctly sets the field width to 2. When we
> attempt to get the data back out of the stream, no width is specified.
> There is no code in place in the get_date() call stack to deal with
> width, and the block of code that does the actual parsing doesn't have
> any concept of field width either [_time_get.cc:284].

It's hard to say from just looking at the code (and I haven't looked
very carefully). In general, we [try to] to implement the POSIX
semantics, so if it works with strptime()/strftime() it should work
with our time_put_byname/ time_get_byname.

> 
> Worse yet is that the tests actually verify this bad behavior. The
> 22.locale.time.get test verifies that the '%e' format fails if there is
> any leading whitespace.

If we test this behavior it's gotta be right ;-) Where does POSIX
say leading spaces must be skipped? I see this under %e: Equivalent
to %d. And under %d: The day of the month [01,31]; leading zeros
are permitted but not required. Nothing about ignoring spaces.

> 
>   // %e Equivalent to %d; leading zeros are permitted but not required.
>   STEP ("%e: equivalent to %d");
>   TEST (T (0, 0, 0,  1),  "01", 2, "e", 0, Eof);
>   TEST (T (0, 0, 0,  9),   "9", 1, "e", 0, Eof);
>   TEST (T (0, 0, 0, 31),  "31", 2, "e", 0, Eof);
>   TEST (T (0, 0, 0,  0),   "0", 1, "e", 0, Eof | Fail);
>   // leading whitespace not allowed
>   TEST (T (0, 0, 0,  0),  " 2", 0, "e", 0, Fail);  // *** problem
>   TEST (T (0, 0, 0,  0),  "99", 2, "e", 0, Eof | Fail);
> 
> The 22.locale.time.put test verifies the leading space is there when
> writing the '%e' format.
> 
>   // %e: the day of the month as a decimal number (1-31);
>   //     a single digit is preceded by a space. [tm_mday]
>   rw_info (0, 0, __LINE__, "%%e: the day of the month as a decimal
> number");
>   TEST (T (), "%e", 0, 0, ' ', "%e");
>   TEST (T (), "%e", 0, 0, ' ', " 1"); // *** problem
>   TEST (T (), " 1", 0, 0, ' ', "%e"); // *** problem
>   TEST (T (-1), "%e", 0, 0, ' ', "%e");
> 
> Feedback?

Without too much research, my first take on this is that it will
probably fall under the "not every output format can be parsed"
category. But we need to do some more reading to confirm this
hypothesis.

Martin


RE: expectation vs requirements for locale facets

Posted by Travis Vitek <tv...@quovadx.com>.
 

Travis Vitek wrote:
> 
>Travis Vitek wrote:
>>
>>The problem is that some locales pad their date/time output with
>>whitespace [like '7. 6. 1988' or ' 7.6.1988'] and I'm unable to use
>>num_get<>::get_[time,date] to read what is written by 
>>num_put<>::put. It
>>is my understanding that I should be able to do so. Is this a bug, a
>>known issue, or is it acceptable behavior that I need to code 
>around in
>>the test?
>
>Whoops, obviously I am talking about the time_[get,put] facets here.
>

Okay, so I've done the required research. These are the relevant
sections of the standard that I was able to find.

  [22.2.5.1 p1]  Each  get  member parses  a  format  as  produced by a
  corresponding format specifier to  time_put<>::put.  If  the sequence
  being parsed maches  the correct format, the corresponding members of
  the  struct tm  argument are  set  to  the values used to produce the
  sequence; otherwise either an error is reported or unspecified values
  are assigned.  

  [22.2.5.1.2 p2] Effects: Reads characters starting at  s until it has
  extracted  those  struct tm members, and remaining format characters,
  used by  time_put<>::put  to produce  the  format specified by 'X' or
  until it encounters an error.

Unless I'm missing something obvious, it appears that the output of the
time_put<> facet is required to be parseable by the time_get<> facet. Of
course that isn't what I'm seeing.

One case that fails is the weekday '%e'. With some locales '%x' expands
out to '%m/%e/%Y'. Anyways, when putting the data,
__rw_get_time_put_data() correctly sets the field width to 2. When we
attempt to get the data back out of the stream, no width is specified.
There is no code in place in the get_date() call stack to deal with
width, and the block of code that does the actual parsing doesn't have
any concept of field width either [_time_get.cc:284].

Worse yet is that the tests actually verify this bad behavior. The
22.locale.time.get test verifies that the '%e' format fails if there is
any leading whitespace.

  // %e Equivalent to %d; leading zeros are permitted but not required.
  STEP ("%e: equivalent to %d");
  TEST (T (0, 0, 0,  1),  "01", 2, "e", 0, Eof);
  TEST (T (0, 0, 0,  9),   "9", 1, "e", 0, Eof);
  TEST (T (0, 0, 0, 31),  "31", 2, "e", 0, Eof);
  TEST (T (0, 0, 0,  0),   "0", 1, "e", 0, Eof | Fail);
  // leading whitespace not allowed
  TEST (T (0, 0, 0,  0),  " 2", 0, "e", 0, Fail);  // *** problem
  TEST (T (0, 0, 0,  0),  "99", 2, "e", 0, Eof | Fail);

The 22.locale.time.put test verifies the leading space is there when
writing the '%e' format.

  // %e: the day of the month as a decimal number (1-31);
  //     a single digit is preceded by a space. [tm_mday]
  rw_info (0, 0, __LINE__, "%%e: the day of the month as a decimal
number");
  TEST (T (), "%e", 0, 0, ' ', "%e");
  TEST (T (), "%e", 0, 0, ' ', " 1"); // *** problem
  TEST (T (), " 1", 0, 0, ' ', "%e"); // *** problem
  TEST (T (-1), "%e", 0, 0, ' ', "%e");

Feedback?

Travis

RE: expectation vs requirements for locale facets

Posted by Travis Vitek <tv...@quovadx.com>.
 
Travis Vitek wrote:
>
>The problem is that some locales pad their date/time output with
>whitespace [like '7. 6. 1988' or ' 7.6.1988'] and I'm unable to use
>num_get<>::get_[time,date] to read what is written by 
>num_put<>::put. It
>is my understanding that I should be able to do so. Is this a bug, a
>known issue, or is it acceptable behavior that I need to code around in
>the test?

Whoops, obviously I am talking about the time_[get,put] facets here.

RE: expectation vs requirements for locale facets

Posted by Mark Brown <mb...@inbox.com>.
> -----Original Message-----
> From: tvitek@quovadx.com
> Sent: Mon, 20 Aug 2007 05:20:32 -0600
> To: stdcxx-dev@incubator.apache.org
> Subject: RE: expectation vs requirements for locale facets
> 
> 
> 
> >Mark Brown wrote:
>> 
> >In my experience, the time_get facet isn't always able to
> >reliably parse international times and cannot parse every time
> >string produced by the time_put facet.
> 
> Yes, I see two different problems here. You can generate output with
> time_put<>::put for which there is no matching time_get<> method for
> parsing that data. What I mean is that you can easily format "%S %p" onto
> the stream, but there is no method in the time_get<> facet for reading
> that formatted data back. The stdcxx implemention provides an extension
> that allows you to do this, but it's an extension.
> 
> The other problem is the one that I'm more concerned about.
> 
> >I don't remember ever
> >having problems with spaces though.
> 
> Yeah, that is the problem. It is my interpretation that this is a
> requirement, but I'm not sure that anyone agrees with me on this. I don't
> really see the point in defining a system for input/output of times and
> dates if you can't read in the values that you write out.

Yeah, that wouldn't be a terribly useful system...

> 
> >On Linux at least, stdcxx
> >has no problems skipping leading space in time strings.
> 
> That is inconsistent with what I'm seeing. [see partial failure lists
> below]

You're right! I was sure I had used stdcxx to parse time strings with spaces in them but now that I've tried it I must acknowledge it really doesn't work. My sincere apologies for confusing the discussion!

-- Mark

RE: expectation vs requirements for locale facets

Posted by Travis Vitek <tv...@quovadx.com>.
 

>Mark Brown wrote:
>
>In my experience, the time_get facet isn't always able to 
>reliably parse international times and cannot parse every time 
>string produced by the time_put facet.

Yes, I see two different problems here. You can generate output with time_put<>::put for which there is no matching time_get<> method for parsing that data. What I mean is that you can easily format "%S %p" onto the stream, but there is no method in the time_get<> facet for reading that formatted data back. The stdcxx implemention provides an extension that allows you to do this, but it's an extension.

The other problem is the one that I'm more concerned about.

>I don't remember ever 
>having problems with spaces though.

Yeah, that is the problem. It is my interpretation that this is a requirement, but I'm not sure that anyone agrees with me on this. I don't really see the point in defining a system for input/output of times and dates if you can't read in the values that you write out.

>On Linux at least, stdcxx 
>has no problems skipping leading space in time strings.

That is inconsistent with what I'm seeing. [see partial failure lists below]

>Which 
>locale and what operating system does it not do so for you?
>

Well, so far I've only tried linux/gcc and win32/vc8 both with their native Standard C++ Library implementations and with the stdcxx implementations. I get failures in each of them with various locales. I attached my testcase to another post in this thread [http://tinyurl.com/2qp7py]. Here is a spew including a few failing locales for each configuration...

win32/vc8/stdcxx

  string= 7.06.1908     locale=Croatian
  string= 7.06.1908     locale=Czech
  string= 7/06/1908     locale=Dutch_Belgium
  string= 7.06.1908     locale=Estonian
  string= 7.06.1908     locale=Finnish
  string= 7/06/1908     locale=Greek
  string= 7.06.1908     locale=Icelandic
  string= 7. 06. 1908   locale=Slovak
  string= 7.06.1908     locale=Slovenian
  string=06/ 7/1908     locale=Swahili
  string= 7-06-1908     locale=Dutch_Netherlands
  string= 7/06/1908     locale=English_Australia
  string=06/ 7/1908     locale=English_Zimbabwe
  string= 7/06/1908     locale=French_Belgium
  string= 7/06/1908     locale=Portuguese_Brazil
  string= 7.06.1908     locale=Swedish_Finland
  string=03:02:01       locale=Afrikaans

win32/vc8/dinkum

  string=1908/06/07     locale=Afrikaans
  string=1908-06-07     locale=Albanian
  string=07.06.1908     locale=Belarusian
  string=07.6.1908 π.   locale=Bulgarian
  string=7.6.1908       locale=Croatian
  string=7.6.1908       locale=Czech
  string=07-06-1908     locale=Danish
  string=7.06.1908      locale=Estonian
  string=7.6.1908       locale=Finnish
  string=1908. 06. 07.  locale=Hungarian
  string=7.6.1908       locale=Icelandic
  string=1908.06.07.    locale=Latvian
  string=1908.06.07     locale=Lithuanian
  string=07.06.1908     locale=Norwegian
  string=1908-06-07     locale=Polish
  string=07.06.1908     locale=Romanian
  string=07.06.1908     locale=Russian
  string=7. 6. 1908     locale=Slovak
  string=7.6.1908       locale=Slovenian
  string=1908-06-07     locale=Swedish
  string=07.06.1908     locale=Tatar
  string=07.06.1908     locale=Turkish
  string=07.06.1908     locale=Ukrainian
  string=7-6-1908       locale=Dutch_Netherlands
  string=1908-06-07     locale=French_Canada
  string=07.06.1908     locale=French_Switzerland
  string=07.06.1908     locale=German_Austria
  string=07.06.1908     locale=German_Germany
  string=07.06.1908     locale=German_Liechtenstein
  string=07.06.1908     locale=German_Luxembourg
  string=07.06.1908     locale=German_Switzerland
  string=07.06.1908     locale=Italian_Switzerland
  string=07-06-1908	    locale=Portuguese_Portugal
  string=07-06-1908     locale=Spanish_Chile
  string=7.6.1908       locale=Swedish_Finland
  string=3.02.01        locale=Italian_Italy

linux/gcc3463/stdcxx

  string=07/06/08      locale=thai
  string= 7.06.1908    locale=bg_BG
  string=07/06/08      locale=lo_LA
  string=07/06/08      locale=th_TH
  string= 3:02:01      locale=aa_DJ
  string= 3:02:01      locale=aa_ER
  string= 3:02:01      locale=aa_ET
  string= 3:02:01      locale=am_ET
  string= 3,02,01      locale=bg_BG
  string= 3:02:01      locale=om_ET
  string= 3:02:01      locale=om_KE
  string= 3:02:01      locale=so_DJ
  string= 3:02:01      locale=so_ET
  string= 3:02:01      locale=so_KE
  string= 3:02:01      locale=so_SO
  string= 3:02:01      locale=ti_ER
  string= 3:02:01      locale=ti_ET

linux/gcc3463/gnustdlib

  string= 3:02:01      locale=aa_DJ
  string= 3:02:01      locale=aa_ER
  string= 3:02:01      locale=aa_ET
  string= 3:02:01      locale=am_ET
  string= 3,02,01      locale=bg_BG
  string= 3:02:01      locale=om_ET
  string= 3:02:01      locale=om_KE
  string= 3:02:01      locale=so_DJ
  string= 3:02:01      locale=so_ET
  string= 3:02:01      locale=so_KE
  string= 3:02:01      locale=so_SO
  string= 3:02:01      locale=ti_ER
  string= 3:02:01      locale=ti_ET
  string=03:02:01 AM   locale=tl_PH

>-- Mark
>