You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stdcxx.apache.org by "Mark Brown (JIRA)" <ji...@apache.org> on 2007/06/04 06:55:15 UTC

[jira] Created: (STDCXX-435) [Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected

[Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected
-------------------------------------------------------------------------

                 Key: STDCXX-435
                 URL: https://issues.apache.org/jira/browse/STDCXX-435
             Project: C++ Standard Library
          Issue Type: Bug
          Components: 22. Localization
    Affects Versions: 4.1.3
         Environment: gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)
            Reporter: Mark Brown


When compiled with gcc 4.1.1 on Linux the program below runs successfully to completion as it should. When compiled with stdcxx the facet returns  a to_next value that is greater than the number of internal (wchar_t) characters actually produced by the conversion and consequently the program aborts.

$ cat t.cpp && make t && ./t
#include <cassert>
#include <cwchar>
#include <locale>

int main ()
{
    const std::locale utf8 ("en_US.UTF-8");
    typedef std::codecvt<wchar_t, char, std::mbstate_t> UTF8_Cvt;

    const UTF8_Cvt &cvt = std::use_facet<UTF8_Cvt>(utf8);

    const char src[] = "abc";
    wchar_t dst [2] = { L'\0' };

    const char* from_next;

    wchar_t* to_next;

    std::mbstate_t state = std::mbstate_t ();

    const std::codecvt_base::result res =
        cvt.in (state,
                src, src + 1, from_next,
                dst, dst + 2, to_next);

    assert (1 == from_next - src);
    assert (1 == to_next - dst);
    assert ('a' == dst [0]);
}

gcc -c -I/home/mbrown/stdcxx/include/ansi -D_RWSTDDEBUG    -I/home/mbrown/stdcxx/include -I/build/mbrown/stdcxx-gcc-4.1.1-11S/include -I/home/mbrown/stdcxx/examples/include  -pedantic -nostdinc++ -g   -W -Wall -Wcast-qual -Winline -Wshadow -Wwrite-strings -Wno-long-long -Wcast-align   t.cpp
t.cpp: In function 'int main()':
t.cpp:21: warning: unused variable 'res'
gcc t.o -o t  -L/build/mbrown/stdcxx-gcc-4.1.1-11S/lib  -lstd11S -lsupc++ -lm 
t: t.cpp:26: int main(): Assertion `1 == from_next - src' failed.
Aborted


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (STDCXX-435) [Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STDCXX-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Sebor updated STDCXX-435:
--------------------------------

         Severity: Incorrect Behavior
    Fix Version/s:     (was: 4.2)
                   4.2.1

It's too late to do this in time for 4.2.0. Rescheduled for 4.2.1.

> [Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected
> -------------------------------------------------------------------------
>
>                 Key: STDCXX-435
>                 URL: https://issues.apache.org/jira/browse/STDCXX-435
>             Project: C++ Standard Library
>          Issue Type: Bug
>          Components: 22. Localization
>    Affects Versions: 4.1.3
>         Environment: gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)
>            Reporter: Mark Brown
>            Assignee: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.1
>
>
> When compiled with gcc 4.1.1 on Linux the program below runs successfully to completion as it should. When compiled with stdcxx the facet returns  a to_next value that is greater than the number of internal (wchar_t) characters actually produced by the conversion and consequently the program aborts.
> $ cat t.cpp && make t && ./t
> #include <cassert>
> #include <cwchar>
> #include <locale>
> int main ()
> {
>     const std::locale utf8 ("en_US.UTF-8");
>     typedef std::codecvt<wchar_t, char, std::mbstate_t> UTF8_Cvt;
>     const UTF8_Cvt &cvt = std::use_facet<UTF8_Cvt>(utf8);
>     const char src[] = "abc";
>     wchar_t dst [2] = { L'\0' };
>     const char* from_next;
>     wchar_t* to_next;
>     std::mbstate_t state = std::mbstate_t ();
>     const std::codecvt_base::result res =
>         cvt.in (state,
>                 src, src + 1, from_next,
>                 dst, dst + 2, to_next);
>     assert (1 == from_next - src);
>     assert (1 == to_next - dst);
>     assert ('a' == dst [0]);
> }
> gcc -c -I/home/mbrown/stdcxx/include/ansi -D_RWSTDDEBUG    -I/home/mbrown/stdcxx/include -I/build/mbrown/stdcxx-gcc-4.1.1-11S/include -I/home/mbrown/stdcxx/examples/include  -pedantic -nostdinc++ -g   -W -Wall -Wcast-qual -Winline -Wshadow -Wwrite-strings -Wno-long-long -Wcast-align   t.cpp
> t.cpp: In function 'int main()':
> t.cpp:21: warning: unused variable 'res'
> gcc t.o -o t  -L/build/mbrown/stdcxx-gcc-4.1.1-11S/lib  -lstd11S -lsupc++ -lm 
> t: t.cpp:26: int main(): Assertion `1 == from_next - src' failed.
> Aborted

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (STDCXX-435) [Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STDCXX-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Sebor reassigned STDCXX-435:
-----------------------------------

    Assignee: Martin Sebor

> [Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected
> -------------------------------------------------------------------------
>
>                 Key: STDCXX-435
>                 URL: https://issues.apache.org/jira/browse/STDCXX-435
>             Project: C++ Standard Library
>          Issue Type: Bug
>          Components: 22. Localization
>    Affects Versions: 4.1.3
>         Environment: gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)
>            Reporter: Mark Brown
>            Assignee: Martin Sebor
>
> When compiled with gcc 4.1.1 on Linux the program below runs successfully to completion as it should. When compiled with stdcxx the facet returns  a to_next value that is greater than the number of internal (wchar_t) characters actually produced by the conversion and consequently the program aborts.
> $ cat t.cpp && make t && ./t
> #include <cassert>
> #include <cwchar>
> #include <locale>
> int main ()
> {
>     const std::locale utf8 ("en_US.UTF-8");
>     typedef std::codecvt<wchar_t, char, std::mbstate_t> UTF8_Cvt;
>     const UTF8_Cvt &cvt = std::use_facet<UTF8_Cvt>(utf8);
>     const char src[] = "abc";
>     wchar_t dst [2] = { L'\0' };
>     const char* from_next;
>     wchar_t* to_next;
>     std::mbstate_t state = std::mbstate_t ();
>     const std::codecvt_base::result res =
>         cvt.in (state,
>                 src, src + 1, from_next,
>                 dst, dst + 2, to_next);
>     assert (1 == from_next - src);
>     assert (1 == to_next - dst);
>     assert ('a' == dst [0]);
> }
> gcc -c -I/home/mbrown/stdcxx/include/ansi -D_RWSTDDEBUG    -I/home/mbrown/stdcxx/include -I/build/mbrown/stdcxx-gcc-4.1.1-11S/include -I/home/mbrown/stdcxx/examples/include  -pedantic -nostdinc++ -g   -W -Wall -Wcast-qual -Winline -Wshadow -Wwrite-strings -Wno-long-long -Wcast-align   t.cpp
> t.cpp: In function 'int main()':
> t.cpp:21: warning: unused variable 'res'
> gcc t.o -o t  -L/build/mbrown/stdcxx-gcc-4.1.1-11S/lib  -lstd11S -lsupc++ -lm 
> t: t.cpp:26: int main(): Assertion `1 == from_next - src' failed.
> Aborted

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (STDCXX-435) [Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501407 ] 

Martin Sebor commented on STDCXX-435:
-------------------------------------

The problem seems to be caused by the fact that in libc mode (i.e., when using the underlying C library) codecvt_byname calls (via __rw_libc_do_in) mbsrtowcs() to convert the source sequence without bothering to make sure it's NUL-terminated. The function attempts to convert the source sequence up until the terminating NUL (or an invalid byte) or until it has produced the requested number of destitation characters. When the destination buffer is large enough for more the number of characters in the source sequence the function just keeps converting past the end.

> [Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected
> -------------------------------------------------------------------------
>
>                 Key: STDCXX-435
>                 URL: https://issues.apache.org/jira/browse/STDCXX-435
>             Project: C++ Standard Library
>          Issue Type: Bug
>          Components: 22. Localization
>    Affects Versions: 4.1.3
>         Environment: gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)
>            Reporter: Mark Brown
>
> When compiled with gcc 4.1.1 on Linux the program below runs successfully to completion as it should. When compiled with stdcxx the facet returns  a to_next value that is greater than the number of internal (wchar_t) characters actually produced by the conversion and consequently the program aborts.
> $ cat t.cpp && make t && ./t
> #include <cassert>
> #include <cwchar>
> #include <locale>
> int main ()
> {
>     const std::locale utf8 ("en_US.UTF-8");
>     typedef std::codecvt<wchar_t, char, std::mbstate_t> UTF8_Cvt;
>     const UTF8_Cvt &cvt = std::use_facet<UTF8_Cvt>(utf8);
>     const char src[] = "abc";
>     wchar_t dst [2] = { L'\0' };
>     const char* from_next;
>     wchar_t* to_next;
>     std::mbstate_t state = std::mbstate_t ();
>     const std::codecvt_base::result res =
>         cvt.in (state,
>                 src, src + 1, from_next,
>                 dst, dst + 2, to_next);
>     assert (1 == from_next - src);
>     assert (1 == to_next - dst);
>     assert ('a' == dst [0]);
> }
> gcc -c -I/home/mbrown/stdcxx/include/ansi -D_RWSTDDEBUG    -I/home/mbrown/stdcxx/include -I/build/mbrown/stdcxx-gcc-4.1.1-11S/include -I/home/mbrown/stdcxx/examples/include  -pedantic -nostdinc++ -g   -W -Wall -Wcast-qual -Winline -Wshadow -Wwrite-strings -Wno-long-long -Wcast-align   t.cpp
> t.cpp: In function 'int main()':
> t.cpp:21: warning: unused variable 'res'
> gcc t.o -o t  -L/build/mbrown/stdcxx-gcc-4.1.1-11S/lib  -lstd11S -lsupc++ -lm 
> t: t.cpp:26: int main(): Assertion `1 == from_next - src' failed.
> Aborted

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (STDCXX-435) [Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STDCXX-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Sebor updated STDCXX-435:
--------------------------------

         Priority: Critical  (was: Major)
    Fix Version/s: 4.2

This is critical because it affects all UTF-8 files. Scheduled for 4.2.0.

> [Linux] std::codecvt_byname("*.UTF-8").in() to_next greater than expected
> -------------------------------------------------------------------------
>
>                 Key: STDCXX-435
>                 URL: https://issues.apache.org/jira/browse/STDCXX-435
>             Project: C++ Standard Library
>          Issue Type: Bug
>          Components: 22. Localization
>    Affects Versions: 4.1.3
>         Environment: gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)
>            Reporter: Mark Brown
>            Assignee: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2
>
>
> When compiled with gcc 4.1.1 on Linux the program below runs successfully to completion as it should. When compiled with stdcxx the facet returns  a to_next value that is greater than the number of internal (wchar_t) characters actually produced by the conversion and consequently the program aborts.
> $ cat t.cpp && make t && ./t
> #include <cassert>
> #include <cwchar>
> #include <locale>
> int main ()
> {
>     const std::locale utf8 ("en_US.UTF-8");
>     typedef std::codecvt<wchar_t, char, std::mbstate_t> UTF8_Cvt;
>     const UTF8_Cvt &cvt = std::use_facet<UTF8_Cvt>(utf8);
>     const char src[] = "abc";
>     wchar_t dst [2] = { L'\0' };
>     const char* from_next;
>     wchar_t* to_next;
>     std::mbstate_t state = std::mbstate_t ();
>     const std::codecvt_base::result res =
>         cvt.in (state,
>                 src, src + 1, from_next,
>                 dst, dst + 2, to_next);
>     assert (1 == from_next - src);
>     assert (1 == to_next - dst);
>     assert ('a' == dst [0]);
> }
> gcc -c -I/home/mbrown/stdcxx/include/ansi -D_RWSTDDEBUG    -I/home/mbrown/stdcxx/include -I/build/mbrown/stdcxx-gcc-4.1.1-11S/include -I/home/mbrown/stdcxx/examples/include  -pedantic -nostdinc++ -g   -W -Wall -Wcast-qual -Winline -Wshadow -Wwrite-strings -Wno-long-long -Wcast-align   t.cpp
> t.cpp: In function 'int main()':
> t.cpp:21: warning: unused variable 'res'
> gcc t.o -o t  -L/build/mbrown/stdcxx-gcc-4.1.1-11S/lib  -lstd11S -lsupc++ -lm 
> t: t.cpp:26: int main(): Assertion `1 == from_next - src' failed.
> Aborted

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.