You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@stdcxx.apache.org by "Martin Sebor (JIRA)" <ji...@apache.org> on 2008/05/07 06:09:55 UTC

[jira] Created: (STDCXX-914) sstream ctors inefficient in reentrant modes

sstream ctors inefficient in reentrant modes
--------------------------------------------

                 Key: STDCXX-914
                 URL: https://issues.apache.org/jira/browse/STDCXX-914
             Project: C++ Standard Library
          Issue Type: Improvement
          Components: 27. Input/Output
    Affects Versions: 4.2.1, 4.2.0, 4.1.4, 4.1.3, 4.1.2
            Reporter: Martin Sebor
            Priority: Critical
             Fix For: 4.2.2


As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594770#action_12594770 ] 

sebor edited comment on STDCXX-914 at 11/8/08 9:35 AM:
--------------------------------------------------------------

gprof flat profile for a 15D build with gcc 4.3.0 on x86_64:
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
 33.40      0.08     0.08  1000000    80.15   200.37  std::string lex_cast<std::string, long>(long const&)
 16.70      0.12     0.04  1000000    40.07    90.17  std::stringstream::~stringstream()
 16.70      0.16     0.04                             main
 12.52      0.19     0.03  1000000    30.06    40.07  std::iostream::~iostream()
  8.35      0.21     0.02  1000000    20.04    20.04  std::iostream::iostream(std::streambuf*)
  4.17      0.22     0.01  1000000    10.02    10.02  std::ostream::~ostream()
  4.17      0.23     0.01  1000000    10.02    30.06  std::stringstream::stringstream(__rw::__rw_openmode)
  4.17      0.24     0.01  1000000    10.02    10.02  std::ios::~ios()
  0.00      0.24     0.00 14000002     0.00     0.00  data_start
  0.00      0.24     0.00  1000001     0.00     0.00  std::allocator<char>::allocator()
  0.00      0.24     0.00  1000000     0.00     0.00  __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref()
  0.00      0.24     0.00  1000000     0.00     0.00  std::stringstream::rdbuf() const
  0.00      0.24     0.00  1000000     0.00     0.00  std::istream::~istream()
{noformat}

gprof call graph:
{noformat}
granularity: each sample hit covers 2 byte(s) for 4.16% of 0.24 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.04    0.20                 main [1]
                0.08    0.12 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
                0.00    0.00 3000002/14000002     data_start [9]
                0.00    0.00       1/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.08    0.12 1000000/1000000     main [1]
[2]     83.3    0.08    0.12 1000000         std::string lex_cast<std::string, long>(long const&) [2]
                0.04    0.05 1000000/1000000     std::stringstream::~stringstream() [3]
                0.01    0.02 1000000/1000000     std::stringstream::stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 5000000/14000002     data_start [9]
                0.00    0.00 1000000/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.04    0.05 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[3]     37.5    0.04    0.05 1000000         std::stringstream::~stringstream() [3]
                0.03    0.01 1000000/1000000     std::iostream::~iostream() [4]
                0.01    0.00 1000000/1000000     std::ios::~ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.03    0.01 1000000/1000000     std::stringstream::~stringstream() [3]
[4]     16.7    0.03    0.01 1000000         std::iostream::~iostream() [4]
                0.01    0.00 1000000/1000000     std::ostream::~ostream() [7]
                0.00    0.00 1000000/1000000     std::istream::~istream() [18]
-----------------------------------------------
                0.01    0.02 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[5]     12.5    0.01    0.02 1000000         std::stringstream::stringstream(__rw::__rw_openmode) [5]
                0.02    0.00 1000000/1000000     std::iostream::iostream(std::streambuf*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
                0.00    0.00 1000000/1000000     std::stringstream::rdbuf() const [17]
-----------------------------------------------
                0.02    0.00 1000000/1000000     std::stringstream::stringstream(__rw::__rw_openmode) [5]
[6]      8.3    0.02    0.00 1000000         std::iostream::iostream(std::streambuf*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::iostream::~iostream() [4]
[7]      4.2    0.01    0.00 1000000         std::ostream::~ostream() [7]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::stringstream::~stringstream() [3]
[8]      4.2    0.01    0.00 1000000         std::ios::~ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.00    0.00 1000000/14000002     std::ios::~ios() [8]
                0.00    0.00 1000000/14000002     std::stringstream::~stringstream() [3]
                0.00    0.00 2000000/14000002     std::iostream::iostream(std::streambuf*) [6]
                0.00    0.00 2000000/14000002     std::stringstream::stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 3000002/14000002     main [1]
                0.00    0.00 5000000/14000002     std::string lex_cast<std::string, long>(long const&) [2]
[9]      0.0    0.00    0.00 14000002         data_start [9]
-----------------------------------------------
                0.00    0.00       1/1000001     main [1]
                0.00    0.00 1000000/1000001     std::string lex_cast<std::string, long>(long const&) [2]
[15]     0.0    0.00    0.00 1000001         std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::allocator<__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> > >::destroy(__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >*) [19]
[16]     0.0    0.00    0.00 1000000         __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref() [16]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::stringstream::stringstream(__rw::__rw_openmode) [5]
[17]     0.0    0.00    0.00 1000000         std::stringstream::rdbuf() const [17]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::iostream::~iostream() [4]
[18]     0.0    0.00    0.00 1000000         std::istream::~istream() [18]
-----------------------------------------------
{noformat}
and index by function name:
{noformat}
   [2] std::string lex_cast<std::string, long>(long const&)
   [4] std::iostream::~iostream()
   [8] std::ios::~ios()
  [16] __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref()
  [18] std::istream::~istream()
   [9] data_start
  [17] std::stringstream::rdbuf() const
   [7] std::ostream::~ostream()
   [1] main
  [15] std::allocator<char>::allocator()
   [5] std::stringstream::stringstream(__rw::__rw_openmode)
   [6] std::iostream::iostream(std::streambuf*)
   [3] std::stringstream::~stringstream()
{noformat}

      was (Author: sebor):
    gprof flat profile for a 15D build with gcc 4.3.0 on x86_64:
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
 33.40      0.08     0.08  1000000    80.15   200.37  std::string lex_cast<std::string, long>(long const&)
 16.70      0.12     0.04  1000000    40.07    90.17  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream()
 16.70      0.16     0.04                             main
 12.52      0.19     0.03  1000000    30.06    40.07  std::basic_iostream<char, std::char_traits<char> >::~basic_iostream()
  8.35      0.21     0.02  1000000    20.04    20.04  std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*)
  4.17      0.22     0.01  1000000    10.02    10.02  std::basic_ostream<char, std::char_traits<char> >::~basic_ostream()
  4.17      0.23     0.01  1000000    10.02    30.06  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode)
  4.17      0.24     0.01  1000000    10.02    10.02  std::basic_ios<char, std::char_traits<char> >::~basic_ios()
  0.00      0.24     0.00 14000002     0.00     0.00  data_start
  0.00      0.24     0.00  1000001     0.00     0.00  std::allocator<char>::allocator()
  0.00      0.24     0.00  1000000     0.00     0.00  __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref()
  0.00      0.24     0.00  1000000     0.00     0.00  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const
  0.00      0.24     0.00  1000000     0.00     0.00  std::basic_istream<char, std::char_traits<char> >::~basic_istream()
{noformat}

gprof call graph:
{noformat}
granularity: each sample hit covers 2 byte(s) for 4.16% of 0.24 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.04    0.20                 main [1]
                0.08    0.12 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
                0.00    0.00 3000002/14000002     data_start [9]
                0.00    0.00       1/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.08    0.12 1000000/1000000     main [1]
[2]     83.3    0.08    0.12 1000000         std::string lex_cast<std::string, long>(long const&) [2]
                0.04    0.05 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.01    0.02 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 5000000/14000002     data_start [9]
                0.00    0.00 1000000/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.04    0.05 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[3]     37.5    0.04    0.05 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.03    0.01 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
                0.01    0.00 1000000/1000000     std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.03    0.01 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
[4]     16.7    0.03    0.01 1000000         std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
                0.01    0.00 1000000/1000000     std::basic_ostream<char, std::char_traits<char> >::~basic_ostream() [7]
                0.00    0.00 1000000/1000000     std::basic_istream<char, std::char_traits<char> >::~basic_istream() [18]
-----------------------------------------------
                0.01    0.02 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[5]     12.5    0.01    0.02 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.02    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
                0.00    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const [17]
-----------------------------------------------
                0.02    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
[6]      8.3    0.02    0.00 1000000         std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
[7]      4.2    0.01    0.00 1000000         std::basic_ostream<char, std::char_traits<char> >::~basic_ostream() [7]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
[8]      4.2    0.01    0.00 1000000         std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.00    0.00 1000000/14000002     std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.00    0.00 2000000/14000002     std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 3000002/14000002     main [1]
                0.00    0.00 5000000/14000002     std::string lex_cast<std::string, long>(long const&) [2]
[9]      0.0    0.00    0.00 14000002         data_start [9]
-----------------------------------------------
                0.00    0.00       1/1000001     main [1]
                0.00    0.00 1000000/1000001     std::string lex_cast<std::string, long>(long const&) [2]
[15]     0.0    0.00    0.00 1000001         std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::allocator<__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> > >::destroy(__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >*) [19]
[16]     0.0    0.00    0.00 1000000         __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref() [16]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
[17]     0.0    0.00    0.00 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const [17]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
[18]     0.0    0.00    0.00 1000000         std::basic_istream<char, std::char_traits<char> >::~basic_istream() [18]
-----------------------------------------------
{noformat}
and index by function name:
{noformat}
   [2] std::string lex_cast<std::string, long>(long const&)
   [4] std::basic_iostream<char, std::char_traits<char> >::~basic_iostream()
   [8] std::basic_ios<char, std::char_traits<char> >::~basic_ios()
  [16] __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref()
  [18] std::basic_istream<char, std::char_traits<char> >::~basic_istream()
   [9] data_start
  [17] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const
   [7] std::basic_ostream<char, std::char_traits<char> >::~basic_ostream()
   [1] main
  [15] std::allocator<char>::allocator()
   [5] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode)
   [6] std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*)
   [3] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream()
{noformat}
  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914.gprof
>
>   Original Estimate: 12h
>          Time Spent: 1h
>  Remaining Estimate: 11h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646015#action_12646015 ] 

sebor edited comment on STDCXX-914 at 11/8/08 10:33 AM:
---------------------------------------------------------------

Here are the top 10 functions from the {{[stdcxx-914-gprof-gcc-4.3.0-12S.txt|https://issues.apache.org/jira/secure/attachment/12393570/stdcxx-914-gprof-gcc-4.3.0-12S.txt]}} attachment. Looks like the {{_C_is_managed()}} function could stand to be optimized...

{noformat}
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 32.40      2.99     2.99 30000000     0.00     0.00  __rw::__rw_locale::_C_is_managed(int) const
 13.98      4.28     1.29 50000000     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
  6.83      4.91     0.63 10000000     0.00     0.00  std::istream& std::operator>><char, std::char_traits<char>, std::allocator<char> >(std::istream&, std::string&)
  4.98      5.37     0.46 10000001     0.00     0.00  std::string::operator=(std::string const&)
  4.98      5.83     0.46 10000000     0.00     0.00  std::ostream& __rw::__rw_insert<char, std::char_traits<char>, long>(std::ostream&, long)
  4.66      6.26     0.43 10000000     0.00     0.00  std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, int, void const*) const
  3.36      6.57     0.31 10000000     0.00     0.00  std::string lex_cast<std::string, long>(long const&)
  3.25      6.87     0.30 10000000     0.00     0.00  __rw::__rw_dtoa(char*, unsigned long, unsigned int)
  3.03      7.15     0.28                             main
  2.49      7.38     0.23 30000000     0.00     0.00  std::locale::~locale()
  ...
{noformat}

      was (Author: sebor):
    Here are the top 10 functions from the {{[stdcxx-914-gprof-gcc-4.3.0-12S.txt|https://issues.apache.org/jira/secure/attachment/12393570/stdcxx-914-gprof-gcc-4.3.0-12S.txt]}} attachment:

{noformat}
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 32.40      2.99     2.99 30000000     0.00     0.00  __rw::__rw_locale::_C_is_managed(int) const
 13.98      4.28     1.29 50000000     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
  6.83      4.91     0.63 10000000     0.00     0.00  std::istream& std::operator>><char, std::char_traits<char>, std::allocator<char> >(std::istream&, std::string&)
  4.98      5.37     0.46 10000001     0.00     0.00  std::string::operator=(std::string const&)
  4.98      5.83     0.46 10000000     0.00     0.00  std::ostream& __rw::__rw_insert<char, std::char_traits<char>, long>(std::ostream&, long)
  4.66      6.26     0.43 10000000     0.00     0.00  std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, int, void const*) const
  3.36      6.57     0.31 10000000     0.00     0.00  std::string lex_cast<std::string, long>(long const&)
  3.25      6.87     0.30 10000000     0.00     0.00  __rw::__rw_dtoa(char*, unsigned long, unsigned int)
  3.03      7.15     0.28                             main
  2.49      7.38     0.23 30000000     0.00     0.00  std::locale::~locale()
  ...
{noformat}
  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 1h
>  Remaining Estimate: 11h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Sebor updated STDCXX-914:
--------------------------------

    Attachment:     (was: stdcxx-914.gprof)

> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 1h
>  Remaining Estimate: 11h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646011#action_12646011 ] 

sebor edited comment on STDCXX-914 at 11/8/08 10:04 AM:
---------------------------------------------------------------

Attached output of gprof for the head of trunk compiled with gcc 4.3.0 in 12S build type.

      was (Author: sebor):
    Output of gprof for the head of trunk compiled with gcc 4.3.0 in 12S build type.
  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.3.0-12S.txt, stdcxx-914.gprof
>
>   Original Estimate: 12h
>          Time Spent: 1h
>  Remaining Estimate: 11h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Sebor updated STDCXX-914:
--------------------------------

    Attachment: stdcxx-914.gprof

Attached full gprof output for a library and test both compiled with {{-D_RWSTD_USE_STRING_ATOMIC_OP}} on the command line (see STDCXX-162).

> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914.gprof
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646021#action_12646021 ] 

sebor edited comment on STDCXX-914 at 11/8/08 4:09 PM:
--------------------------------------------------------------

Here's a superficially tested patch to optimize {{\_\_rw_locale::_C_is_managed()}} and  {{\_\_rw_locale::_C_manage()}} in {{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}. It improves the performance of the test case by about 45% (down from 18.905s to 12.147s on an Intel Core 2 6600 running at 2.40GHz) by having  {{\_\_rw_locale::_C_is_managed()}} avoid expensive tests for named faces in the "C" locale and by using a more efficient way to detect the classic locale in {{\_\_rw_locale::_C_manage()}} when invoked from {{locale::~locale()}}.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -859,7 +859,22 @@
         return tmp;
     }
 
+    if (plocale && plocale == classic) {
+        // optimize the "destruction" of the classic C locale
+        // the object is never destroyed and its reference count
+        // never drops to 0
+        _RWSTD_ASSERT (__rw_is_C (locname));
+        _RWSTD_ASSERT (__rw_is_C (plocale->_C_name));
 
+        const size_t ref =
+            _RWSTD_ATOMIC_PREDECREMENT (plocale->_C_ref, false);
+
+        _RWSTD_ASSERT (ref + 1U != 0);
+        _RWSTD_UNUSED (ref);
+
+        return 0;
+    }
+
     // re-entrant to protect static local data structures
     // (not the locales themselves)
     _RWSTD_MT_STATIC_GUARD (_RW::__rw_locale);
@@ -1066,6 +1081,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}

With the patch applied, the top 12 list looks like so:
\\
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 16.70      0.97     0.97 50000000     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
 12.57      1.70     0.73 10000000     0.00     0.00  std::basic_istream<char, std::char_traits<char> >& std::operator>><char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
  8.43      2.19     0.49 10000000     0.00     0.00  std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, int, void const*) const
  7.06      2.60     0.41 10000001     0.00     0.00  std::string::operator=(std::string const&)
  6.45      2.98     0.38 10000000     0.00     0.00  std::string lex_cast<std::string, long>(long const&)
  5.34      3.29     0.31 10000000     0.00     0.00  __rw::__rw_dtoa(char*, unsigned long, unsigned int)
  4.65      3.56     0.27                             main
  4.30      3.81     0.25 10000000     0.00     0.00  std::basic_ostream<char, std::char_traits<char> >& __rw::__rw_insert<char, std::char_traits<char>, long>(std::basic_ostream<char, std::char_traits<char> >&, long)
  3.27      4.00     0.19 10000000     0.00     0.00  std::locale::locale(std::locale const&)
  3.01      4.17     0.18 10000000     0.00     0.00  std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::str(char const*, unsigned long)
  2.75      4.33     0.16 30000000     0.00     0.00  __rw::__rw_locale::_C_is_managed(int) const
  2.75      4.49     0.16 30000000     0.00     0.00  std::locale::~locale()
{noformat}


      was (Author: sebor):
    Here's a superficially tested patch to optimize {{__rw_locale::_C_is_managed()}} in {{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}. It improves the performance of the test case by about 25% by avoiding expensive tests for named faces in the "C" locale.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -1066,6 +1066,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}

With the patch applied, the top 10 list looks like so:
\\
\\
{noformat}
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 24.54      1.45     1.45 50000001     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
  9.48      2.01     0.56 10000000     0.00     0.00  std::basic_istream<char, std::char_traits<char> >& std::operator>><char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
  7.11      2.43     0.42 10000001     0.00     0.00  std::money_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, int, std::ios_base&, char, char const*, unsigned long, int, char const*, unsigned long) const
  6.94      2.84     0.41 40000003     0.00     0.00  std::locale::_C_get_std_facet(__rw::__rw_facet::_C_facet_type, __rw::__rw_facet* (*)(unsigned long, char const*)) const
  6.77      3.24     0.40 10000000     0.00     0.00  std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, int, void const*) const
  5.58      3.57     0.33 10000001     0.00     0.00  __rw::__rw_itoa(char*, unsigned long long, unsigned int)
  4.91      3.86     0.29 10000000     0.00     0.00  std::basic_ostream<char, std::char_traits<char> >& __rw::__rw_insert<char, std::char_traits<char>, long>(std::basic_ostream<char, std::char_traits<char> >&, long)
  4.57      4.13     0.27                             std::basic_iostream<char, std::char_traits<char> >::~basic_iostream()
  3.13      4.32     0.19 10000000     0.00     0.00  std::string::replace(unsigned long, unsigned long, char const*, unsigned long)
  2.88      4.49     0.17 10000000     0.00     0.00  std::string lex_cast<std::string, long>(long const&)
{noformat}

  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 2.5h
>  Remaining Estimate: 9.5h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Sebor updated STDCXX-914:
--------------------------------

    Attachment: stdcxx-914.gprof

Attached full gprof output for a library and test +this time+ both +really+ compiled with -D_RWSTD_USE_STRING_ATOMIC_OPS on the command line (see STDCXX-162).

> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914.gprof
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646021#action_12646021 ] 

Martin Sebor commented on STDCXX-914:
-------------------------------------

Here's a superficially tested patch to optimize {{__rw_locale::_C_is_managed()}} in {{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}. It improves the performance of the test case by about 25% by avoiding expensive tests for named faces in the "C" locale.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -1066,6 +1066,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}

> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 1h
>  Remaining Estimate: 11h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594770#action_12594770 ] 

sebor edited comment on STDCXX-914 at 5/7/08 7:41 AM:
-------------------------------------------------------------

gprof flat profile for a 15D build with gcc 4.3.0 on x86_64:
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
 33.40      0.08     0.08  1000000    80.15   200.37  std::string lex_cast<std::string, long>(long const&)
 16.70      0.12     0.04  1000000    40.07    90.17  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream()
 16.70      0.16     0.04                             main
 12.52      0.19     0.03  1000000    30.06    40.07  std::basic_iostream<char, std::char_traits<char> >::~basic_iostream()
  8.35      0.21     0.02  1000000    20.04    20.04  std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*)
  4.17      0.22     0.01  1000000    10.02    10.02  std::basic_ostream<char, std::char_traits<char> >::~basic_ostream()
  4.17      0.23     0.01  1000000    10.02    30.06  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode)
  4.17      0.24     0.01  1000000    10.02    10.02  std::basic_ios<char, std::char_traits<char> >::~basic_ios()
  0.00      0.24     0.00 14000002     0.00     0.00  data_start
  0.00      0.24     0.00  1000001     0.00     0.00  std::allocator<char>::allocator()
  0.00      0.24     0.00  1000000     0.00     0.00  __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref()
  0.00      0.24     0.00  1000000     0.00     0.00  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const
  0.00      0.24     0.00  1000000     0.00     0.00  std::basic_istream<char, std::char_traits<char> >::~basic_istream()
{noformat}

gprof call graph:
{noformat}
granularity: each sample hit covers 2 byte(s) for 4.16% of 0.24 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.04    0.20                 main [1]
                0.08    0.12 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
                0.00    0.00 3000002/14000002     data_start [9]
                0.00    0.00       1/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.08    0.12 1000000/1000000     main [1]
[2]     83.3    0.08    0.12 1000000         std::string lex_cast<std::string, long>(long const&) [2]
                0.04    0.05 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.01    0.02 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 5000000/14000002     data_start [9]
                0.00    0.00 1000000/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.04    0.05 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[3]     37.5    0.04    0.05 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.03    0.01 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
                0.01    0.00 1000000/1000000     std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.03    0.01 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
[4]     16.7    0.03    0.01 1000000         std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
                0.01    0.00 1000000/1000000     std::basic_ostream<char, std::char_traits<char> >::~basic_ostream() [7]
                0.00    0.00 1000000/1000000     std::basic_istream<char, std::char_traits<char> >::~basic_istream() [18]
-----------------------------------------------
                0.01    0.02 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[5]     12.5    0.01    0.02 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.02    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
                0.00    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const [17]
-----------------------------------------------
                0.02    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
[6]      8.3    0.02    0.00 1000000         std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
[7]      4.2    0.01    0.00 1000000         std::basic_ostream<char, std::char_traits<char> >::~basic_ostream() [7]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
[8]      4.2    0.01    0.00 1000000         std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.00    0.00 1000000/14000002     std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.00    0.00 2000000/14000002     std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 3000002/14000002     main [1]
                0.00    0.00 5000000/14000002     std::string lex_cast<std::string, long>(long const&) [2]
[9]      0.0    0.00    0.00 14000002         data_start [9]
-----------------------------------------------
                0.00    0.00       1/1000001     main [1]
                0.00    0.00 1000000/1000001     std::string lex_cast<std::string, long>(long const&) [2]
[15]     0.0    0.00    0.00 1000001         std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::allocator<__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> > >::destroy(__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >*) [19]
[16]     0.0    0.00    0.00 1000000         __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref() [16]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
[17]     0.0    0.00    0.00 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const [17]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
[18]     0.0    0.00    0.00 1000000         std::basic_istream<char, std::char_traits<char> >::~basic_istream() [18]
-----------------------------------------------
{noformat}
and index by function name:
{noformat}
   [2] std::string lex_cast<std::string, long>(long const&)
   [4] std::basic_iostream<char, std::char_traits<char> >::~basic_iostream()
   [8] std::basic_ios<char, std::char_traits<char> >::~basic_ios()
  [16] __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref()
  [18] std::basic_istream<char, std::char_traits<char> >::~basic_istream()
   [9] data_start
  [17] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const
   [7] std::basic_ostream<char, std::char_traits<char> >::~basic_ostream()
   [1] main
  [15] std::allocator<char>::allocator()
   [5] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode)
   [6] std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*)
   [3] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream()
{noformat}

      was (Author: sebor):
    gprof flat profile for a 15D build with gcc 4.3.0 on x86_64:
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
 33.40      0.08     0.08  1000000    80.15   200.37  std::string lex_cast<std::string, long>(long const&)
 16.70      0.12     0.04  1000000    40.07    90.17  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream()
 16.70      0.16     0.04                             main
 12.52      0.19     0.03  1000000    30.06    40.07  std::basic_iostream<char, std::char_traits<char> >::~basic_iostream()
  8.35      0.21     0.02  1000000    20.04    20.04  std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*)
  4.17      0.22     0.01  1000000    10.02    10.02  std::basic_ostream<char, std::char_traits<char> >::~basic_ostream()
  4.17      0.23     0.01  1000000    10.02    30.06  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode)
  4.17      0.24     0.01  1000000    10.02    10.02  std::basic_ios<char, std::char_traits<char> >::~basic_ios()
  0.00      0.24     0.00 14000002     0.00     0.00  data_start
  0.00      0.24     0.00  1000001     0.00     0.00  std::allocator<char>::allocator()
  0.00      0.24     0.00  1000000     0.00     0.00  __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref()
  0.00      0.24     0.00  1000000     0.00     0.00  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const
  0.00      0.24     0.00  1000000     0.00     0.00  std::basic_istream<char, std::char_traits<char> >::~basic_istream()
{noformat}

gprof call graph:
{noformat}
granularity: each sample hit covers 2 byte(s) for 4.16% of 0.24 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.04    0.20                 main [1]
                0.08    0.12 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
                0.00    0.00 3000002/14000002     data_start [9]
                0.00    0.00       1/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.08    0.12 1000000/1000000     main [1]
[2]     83.3    0.08    0.12 1000000         std::string lex_cast<std::string, long>(long const&) [2]
                0.04    0.05 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.01    0.02 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 5000000/14000002     data_start [9]
                0.00    0.00 1000000/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.04    0.05 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[3]     37.5    0.04    0.05 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.03    0.01 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
                0.01    0.00 1000000/1000000     std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.03    0.01 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
[4]     16.7    0.03    0.01 1000000         std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
                0.01    0.00 1000000/1000000     std::basic_ostream<char, std::char_traits<char> >::~basic_ostream() [7]
                0.00    0.00 1000000/1000000     std::basic_istream<char, std::char_traits<char> >::~basic_istream() [18]
-----------------------------------------------
                0.01    0.02 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[5]     12.5    0.01    0.02 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.02    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
                0.00    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const [17]
-----------------------------------------------
                0.02    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
[6]      8.3    0.02    0.00 1000000         std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
[7]      4.2    0.01    0.00 1000000         std::basic_ostream<char, std::char_traits<char> >::~basic_ostream() [7]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
[8]      4.2    0.01    0.00 1000000         std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.00    0.00 1000000/14000002     std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.00    0.00 2000000/14000002     std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 3000002/14000002     main [1]
                0.00    0.00 5000000/14000002     std::string lex_cast<std::string, long>(long const&) [2]
[9]      0.0    0.00    0.00 14000002         data_start [9]
-----------------------------------------------
                0.00    0.00       1/1000001     main [1]
                0.00    0.00 1000000/1000001     std::string lex_cast<std::string, long>(long const&) [2]
[15]     0.0    0.00    0.00 1000001         std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::allocator<__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> > >::destroy(__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >*) [19]
[16]     0.0    0.00    0.00 1000000         __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref() [16]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
[17]     0.0    0.00    0.00 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const [17]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
[18]     0.0    0.00    0.00 1000000         std::basic_istream<char, std::char_traits<char> >::~basic_istream() [18]
-----------------------------------------------
{noformat}
and index by function name:
{noformat}
   [2] std::string lex_cast<std::string, long>(long const&) [4] std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [8] std::basic_ios<char, std::char_traits<char> >::~basic_ios()
  [16] __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref() [18] std::basic_istream<char, std::char_traits<char> >::~basic_istream() [9] data_start
  [17] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const [7] std::basic_ostream<char, std::char_traits<char> >::~basic_ostream() [1] main
  [15] std::allocator<char>::allocator() [5] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode)
   [6] std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [3] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream()
{noformat}
  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914.gprof
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Sebor reassigned STDCXX-914:
-----------------------------------

    Assignee: Martin Sebor

> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Assignee: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 2.5h
>  Remaining Estimate: 9.5h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Sebor updated STDCXX-914:
--------------------------------

    Attachment:     (was: stdcxx-914.gprof)

> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914.gprof
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594773#action_12594773 ] 

sebor edited comment on STDCXX-914 at 5/6/08 9:33 PM:
-------------------------------------------------------------

Attached full gprof output for a library and test +this time+ both +really+ compiled with {{-D_RWSTD_USE_STRING_ATOMIC_OPS}} on the command line (see STDCXX-162).

      was (Author: sebor):
    Attached full gprof output for a library and test +this time+ both +really+ compiled with -D_RWSTD_USE_STRING_ATOMIC_OPS on the command line (see STDCXX-162).
  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914.gprof
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646015#action_12646015 ] 

Martin Sebor commented on STDCXX-914:
-------------------------------------

Here are the top 10 functions from the {{[stdcxx-914-gprof-gcc-4.3.0-12S.txt|https://issues.apache.org/jira/secure/attachment/12393570/stdcxx-914-gprof-gcc-4.3.0-12S.txt]}} attachment:

{noformat}
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 32.40      2.99     2.99 30000000     0.00     0.00  __rw::__rw_locale::_C_is_managed(int) const
 13.98      4.28     1.29 50000000     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
  6.83      4.91     0.63 10000000     0.00     0.00  std::istream& std::operator>><char, std::char_traits<char>, std::allocator<char> >(std::istream&, std::string&)
  4.98      5.37     0.46 10000001     0.00     0.00  std::string::operator=(std::string const&)
  4.98      5.83     0.46 10000000     0.00     0.00  std::ostream& __rw::__rw_insert<char, std::char_traits<char>, long>(std::ostream&, long)
  4.66      6.26     0.43 10000000     0.00     0.00  std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, int, void const*) const
  3.36      6.57     0.31 10000000     0.00     0.00  std::string lex_cast<std::string, long>(long const&)
  3.25      6.87     0.30 10000000     0.00     0.00  __rw::__rw_dtoa(char*, unsigned long, unsigned int)
  3.03      7.15     0.28                             main
  2.49      7.38     0.23 30000000     0.00     0.00  std::locale::~locale()
  ...
{noformat}

> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 1h
>  Remaining Estimate: 11h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646021#action_12646021 ] 

sebor edited comment on STDCXX-914 at 11/8/08 4:21 PM:
--------------------------------------------------------------

Here's a superficially tested patch to optimize {{\_\_rw_locale::_C_is_managed()}} and  {{\_\_rw_locale::_C_manage()}} in {{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}. It improves the performance of the test case by about 45% (down from 18.905s to 12.147s on an Intel Core 2 6600 running at 2.40GHz) by having  {{\_\_rw_locale::_C_is_managed()}} avoid expensive tests for named faces in the "C" locale and by using a more efficient way to detect the classic locale in {{\_\_rw_locale::_C_manage()}} when invoked from {{locale::~locale()}}.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -859,7 +859,22 @@
         return tmp;
     }
 
+    if (plocale && plocale == classic) {
+        // optimize the "destruction" of the classic C locale
+        // the object is never destroyed and its reference count
+        // never drops to 0
+        _RWSTD_ASSERT (__rw_is_C (locname));
+        _RWSTD_ASSERT (__rw_is_C (plocale->_C_name));
 
+        const size_t ref =
+            _RWSTD_ATOMIC_PREDECREMENT (plocale->_C_ref, false);
+
+        _RWSTD_ASSERT (ref + 1U != 0);
+        _RWSTD_UNUSED (ref);
+
+        return 0;
+    }
+
     // re-entrant to protect static local data structures
     // (not the locales themselves)
     _RWSTD_MT_STATIC_GUARD (_RW::__rw_locale);
@@ -1066,6 +1081,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}

With the patch applied, the top 12 list looks like so:
\\
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 16.70      0.97     0.97 50000000     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
 12.57      1.70     0.73 10000000     0.00     0.00  std::istream& std::operator>>(std::istream&, std::string&)
  8.43      2.19     0.49 10000000     0.00     0.00  std::num_put::_C_put(std::ostreambuf_iterator, std::ios_base&, char, int, void const*) const
  7.06      2.60     0.41 10000001     0.00     0.00  std::string::operator=(std::string const&)
  6.45      2.98     0.38 10000000     0.00     0.00  std::string lex_cast<std::string, long>(long const&)
  5.34      3.29     0.31 10000000     0.00     0.00  __rw::__rw_dtoa(char*, unsigned long, unsigned int)
  4.65      3.56     0.27                             main
  4.30      3.81     0.25 10000000     0.00     0.00  std::ostream& __rw::__rw_insert(std::ostream&, long)
  3.27      4.00     0.19 10000000     0.00     0.00  std::locale::locale(std::locale const&)
  3.01      4.17     0.18 10000000     0.00     0.00  std::stringbuf::str(char const*, unsigned long)
  2.75      4.33     0.16 30000000     0.00     0.00  __rw::__rw_locale::_C_is_managed(int) const
  2.75      4.49     0.16 30000000     0.00     0.00  std::locale::~locale()
{noformat}


      was (Author: sebor):
    Here's a superficially tested patch to optimize {{\_\_rw_locale::_C_is_managed()}} and  {{\_\_rw_locale::_C_manage()}} in {{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}. It improves the performance of the test case by about 45% (down from 18.905s to 12.147s on an Intel Core 2 6600 running at 2.40GHz) by having  {{\_\_rw_locale::_C_is_managed()}} avoid expensive tests for named faces in the "C" locale and by using a more efficient way to detect the classic locale in {{\_\_rw_locale::_C_manage()}} when invoked from {{locale::~locale()}}.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -859,7 +859,22 @@
         return tmp;
     }
 
+    if (plocale && plocale == classic) {
+        // optimize the "destruction" of the classic C locale
+        // the object is never destroyed and its reference count
+        // never drops to 0
+        _RWSTD_ASSERT (__rw_is_C (locname));
+        _RWSTD_ASSERT (__rw_is_C (plocale->_C_name));
 
+        const size_t ref =
+            _RWSTD_ATOMIC_PREDECREMENT (plocale->_C_ref, false);
+
+        _RWSTD_ASSERT (ref + 1U != 0);
+        _RWSTD_UNUSED (ref);
+
+        return 0;
+    }
+
     // re-entrant to protect static local data structures
     // (not the locales themselves)
     _RWSTD_MT_STATIC_GUARD (_RW::__rw_locale);
@@ -1066,6 +1081,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}

With the patch applied, the top 12 list looks like so:
\\
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 16.70      0.97     0.97 50000000     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
 12.57      1.70     0.73 10000000     0.00     0.00  std::basic_istream<char, std::char_traits<char> >& std::operator>><char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
  8.43      2.19     0.49 10000000     0.00     0.00  std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, int, void const*) const
  7.06      2.60     0.41 10000001     0.00     0.00  std::string::operator=(std::string const&)
  6.45      2.98     0.38 10000000     0.00     0.00  std::string lex_cast<std::string, long>(long const&)
  5.34      3.29     0.31 10000000     0.00     0.00  __rw::__rw_dtoa(char*, unsigned long, unsigned int)
  4.65      3.56     0.27                             main
  4.30      3.81     0.25 10000000     0.00     0.00  std::basic_ostream<char, std::char_traits<char> >& __rw::__rw_insert<char, std::char_traits<char>, long>(std::basic_ostream<char, std::char_traits<char> >&, long)
  3.27      4.00     0.19 10000000     0.00     0.00  std::locale::locale(std::locale const&)
  3.01      4.17     0.18 10000000     0.00     0.00  std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::str(char const*, unsigned long)
  2.75      4.33     0.16 30000000     0.00     0.00  __rw::__rw_locale::_C_is_managed(int) const
  2.75      4.49     0.16 30000000     0.00     0.00  std::locale::~locale()
{noformat}

  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 2.5h
>  Remaining Estimate: 9.5h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646021#action_12646021 ] 

sebor edited comment on STDCXX-914 at 11/8/08 11:02 AM:
---------------------------------------------------------------

Here's a superficially tested patch to optimize {{__rw_locale::_C_is_managed()}} in {{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}. It improves the performance of the test case by about 25% by avoiding expensive tests for named faces in the "C" locale.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -1066,6 +1066,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}

With the patch applied, the top 10 list looks like so:
\\
\\
{noformat}
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 24.54      1.45     1.45 50000001     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
  9.48      2.01     0.56 10000000     0.00     0.00  std::basic_istream<char, std::char_traits<char> >& std::operator>><char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
  7.11      2.43     0.42 10000001     0.00     0.00  std::money_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, int, std::ios_base&, char, char const*, unsigned long, int, char const*, unsigned long) const
  6.94      2.84     0.41 40000003     0.00     0.00  std::locale::_C_get_std_facet(__rw::__rw_facet::_C_facet_type, __rw::__rw_facet* (*)(unsigned long, char const*)) const
  6.77      3.24     0.40 10000000     0.00     0.00  std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, int, void const*) const
  5.58      3.57     0.33 10000001     0.00     0.00  __rw::__rw_itoa(char*, unsigned long long, unsigned int)
  4.91      3.86     0.29 10000000     0.00     0.00  std::basic_ostream<char, std::char_traits<char> >& __rw::__rw_insert<char, std::char_traits<char>, long>(std::basic_ostream<char, std::char_traits<char> >&, long)
  4.57      4.13     0.27                             std::basic_iostream<char, std::char_traits<char> >::~basic_iostream()
  3.13      4.32     0.19 10000000     0.00     0.00  std::string::replace(unsigned long, unsigned long, char const*, unsigned long)
  2.88      4.49     0.17 10000000     0.00     0.00  std::string lex_cast<std::string, long>(long const&)
{noformat}


      was (Author: sebor):
    Here's a superficially tested patch to optimize {{__rw_locale::_C_is_managed()}} in {{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}. It improves the performance of the test case by about 25% by avoiding expensive tests for named faces in the "C" locale.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -1066,6 +1066,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}
  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 1h
>  Remaining Estimate: 11h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Sebor updated STDCXX-914:
--------------------------------

    Attachment: stdcxx-914-gprof-gcc-4.3.0-12S.txt

Output of gprof for the head of trunk compiled with gcc 4.3.0 in 12S build type.

> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.3.0-12S.txt, stdcxx-914.gprof
>
>   Original Estimate: 12h
>          Time Spent: 1h
>  Remaining Estimate: 11h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594770#action_12594770 ] 

sebor edited comment on STDCXX-914 at 11/8/08 4:16 PM:
--------------------------------------------------------------

gprof flat profile for a 15D build with gcc 4.3.0 on x86_64 (with most template arguments removed for brevity):
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
 33.40      0.08     0.08  1000000    80.15   200.37  std::string lex_cast<std::string, long>(long const&)
 16.70      0.12     0.04  1000000    40.07    90.17  std::stringstream::~stringstream()
 16.70      0.16     0.04                             main
 12.52      0.19     0.03  1000000    30.06    40.07  std::iostream::~iostream()
  8.35      0.21     0.02  1000000    20.04    20.04  std::iostream::iostream(std::streambuf*)
  4.17      0.22     0.01  1000000    10.02    10.02  std::ostream::~ostream()
  4.17      0.23     0.01  1000000    10.02    30.06  std::stringstream::stringstream(__rw::__rw_openmode)
  4.17      0.24     0.01  1000000    10.02    10.02  std::ios::~ios()
  0.00      0.24     0.00 14000002     0.00     0.00  data_start
  0.00      0.24     0.00  1000001     0.00     0.00  std::allocator<char>::allocator()
  0.00      0.24     0.00  1000000     0.00     0.00  __rw::__string_ref::~__string_ref()
  0.00      0.24     0.00  1000000     0.00     0.00  std::stringstream::rdbuf() const
  0.00      0.24     0.00  1000000     0.00     0.00  std::istream::~istream()
{noformat}

gprof call graph:
{noformat}
granularity: each sample hit covers 2 byte(s) for 4.16% of 0.24 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.04    0.20                 main [1]
                0.08    0.12 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
                0.00    0.00 3000002/14000002     data_start [9]
                0.00    0.00       1/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.08    0.12 1000000/1000000     main [1]
[2]     83.3    0.08    0.12 1000000         std::string lex_cast<std::string, long>(long const&) [2]
                0.04    0.05 1000000/1000000     std::stringstream::~stringstream() [3]
                0.01    0.02 1000000/1000000     std::stringstream::stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 5000000/14000002     data_start [9]
                0.00    0.00 1000000/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.04    0.05 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[3]     37.5    0.04    0.05 1000000         std::stringstream::~stringstream() [3]
                0.03    0.01 1000000/1000000     std::iostream::~iostream() [4]
                0.01    0.00 1000000/1000000     std::ios::~ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.03    0.01 1000000/1000000     std::stringstream::~stringstream() [3]
[4]     16.7    0.03    0.01 1000000         std::iostream::~iostream() [4]
                0.01    0.00 1000000/1000000     std::ostream::~ostream() [7]
                0.00    0.00 1000000/1000000     std::istream::~istream() [18]
-----------------------------------------------
                0.01    0.02 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[5]     12.5    0.01    0.02 1000000         std::stringstream::stringstream(__rw::__rw_openmode) [5]
                0.02    0.00 1000000/1000000     std::iostream::iostream(std::streambuf*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
                0.00    0.00 1000000/1000000     std::stringstream::rdbuf() const [17]
-----------------------------------------------
                0.02    0.00 1000000/1000000     std::stringstream::stringstream(__rw::__rw_openmode) [5]
[6]      8.3    0.02    0.00 1000000         std::iostream::iostream(std::streambuf*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::iostream::~iostream() [4]
[7]      4.2    0.01    0.00 1000000         std::ostream::~ostream() [7]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::stringstream::~stringstream() [3]
[8]      4.2    0.01    0.00 1000000         std::ios::~ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.00    0.00 1000000/14000002     std::ios::~ios() [8]
                0.00    0.00 1000000/14000002     std::stringstream::~stringstream() [3]
                0.00    0.00 2000000/14000002     std::iostream::iostream(std::streambuf*) [6]
                0.00    0.00 2000000/14000002     std::stringstream::stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 3000002/14000002     main [1]
                0.00    0.00 5000000/14000002     std::string lex_cast<std::string, long>(long const&) [2]
[9]      0.0    0.00    0.00 14000002         data_start [9]
-----------------------------------------------
                0.00    0.00       1/1000001     main [1]
                0.00    0.00 1000000/1000001     std::string lex_cast<std::string, long>(long const&) [2]
[15]     0.0    0.00    0.00 1000001         std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::allocator<__rw::__string_ref>::destroy(__rw::__string_ref*) [19]
[16]     0.0    0.00    0.00 1000000         __rw::__string_ref::~__string_ref() [16]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::stringstream::stringstream(__rw::__rw_openmode) [5]
[17]     0.0    0.00    0.00 1000000         std::stringstream::rdbuf() const [17]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::iostream::~iostream() [4]
[18]     0.0    0.00    0.00 1000000         std::istream::~istream() [18]
-----------------------------------------------
{noformat}
and index by function name:
{noformat}
   [2] std::string lex_cast<std::string, long>(long const&)
   [4] std::iostream::~iostream()
   [8] std::ios::~ios()
  [16] __rw::__string_ref::~__string_ref()
  [18] std::istream::~istream()
   [9] data_start
  [17] std::stringstream::rdbuf() const
   [7] std::ostream::~ostream()
   [1] main
  [15] std::allocator<char>::allocator()
   [5] std::stringstream::stringstream(__rw::__rw_openmode)
   [6] std::iostream::iostream(std::streambuf*)
   [3] std::stringstream::~stringstream()
{noformat}

      was (Author: sebor):
    gprof flat profile for a 15D build with gcc 4.3.0 on x86_64:
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
 33.40      0.08     0.08  1000000    80.15   200.37  std::string lex_cast<std::string, long>(long const&)
 16.70      0.12     0.04  1000000    40.07    90.17  std::stringstream::~stringstream()
 16.70      0.16     0.04                             main
 12.52      0.19     0.03  1000000    30.06    40.07  std::iostream::~iostream()
  8.35      0.21     0.02  1000000    20.04    20.04  std::iostream::iostream(std::streambuf*)
  4.17      0.22     0.01  1000000    10.02    10.02  std::ostream::~ostream()
  4.17      0.23     0.01  1000000    10.02    30.06  std::stringstream::stringstream(__rw::__rw_openmode)
  4.17      0.24     0.01  1000000    10.02    10.02  std::ios::~ios()
  0.00      0.24     0.00 14000002     0.00     0.00  data_start
  0.00      0.24     0.00  1000001     0.00     0.00  std::allocator<char>::allocator()
  0.00      0.24     0.00  1000000     0.00     0.00  __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref()
  0.00      0.24     0.00  1000000     0.00     0.00  std::stringstream::rdbuf() const
  0.00      0.24     0.00  1000000     0.00     0.00  std::istream::~istream()
{noformat}

gprof call graph:
{noformat}
granularity: each sample hit covers 2 byte(s) for 4.16% of 0.24 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.04    0.20                 main [1]
                0.08    0.12 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
                0.00    0.00 3000002/14000002     data_start [9]
                0.00    0.00       1/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.08    0.12 1000000/1000000     main [1]
[2]     83.3    0.08    0.12 1000000         std::string lex_cast<std::string, long>(long const&) [2]
                0.04    0.05 1000000/1000000     std::stringstream::~stringstream() [3]
                0.01    0.02 1000000/1000000     std::stringstream::stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 5000000/14000002     data_start [9]
                0.00    0.00 1000000/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.04    0.05 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[3]     37.5    0.04    0.05 1000000         std::stringstream::~stringstream() [3]
                0.03    0.01 1000000/1000000     std::iostream::~iostream() [4]
                0.01    0.00 1000000/1000000     std::ios::~ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.03    0.01 1000000/1000000     std::stringstream::~stringstream() [3]
[4]     16.7    0.03    0.01 1000000         std::iostream::~iostream() [4]
                0.01    0.00 1000000/1000000     std::ostream::~ostream() [7]
                0.00    0.00 1000000/1000000     std::istream::~istream() [18]
-----------------------------------------------
                0.01    0.02 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[5]     12.5    0.01    0.02 1000000         std::stringstream::stringstream(__rw::__rw_openmode) [5]
                0.02    0.00 1000000/1000000     std::iostream::iostream(std::streambuf*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
                0.00    0.00 1000000/1000000     std::stringstream::rdbuf() const [17]
-----------------------------------------------
                0.02    0.00 1000000/1000000     std::stringstream::stringstream(__rw::__rw_openmode) [5]
[6]      8.3    0.02    0.00 1000000         std::iostream::iostream(std::streambuf*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::iostream::~iostream() [4]
[7]      4.2    0.01    0.00 1000000         std::ostream::~ostream() [7]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::stringstream::~stringstream() [3]
[8]      4.2    0.01    0.00 1000000         std::ios::~ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.00    0.00 1000000/14000002     std::ios::~ios() [8]
                0.00    0.00 1000000/14000002     std::stringstream::~stringstream() [3]
                0.00    0.00 2000000/14000002     std::iostream::iostream(std::streambuf*) [6]
                0.00    0.00 2000000/14000002     std::stringstream::stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 3000002/14000002     main [1]
                0.00    0.00 5000000/14000002     std::string lex_cast<std::string, long>(long const&) [2]
[9]      0.0    0.00    0.00 14000002         data_start [9]
-----------------------------------------------
                0.00    0.00       1/1000001     main [1]
                0.00    0.00 1000000/1000001     std::string lex_cast<std::string, long>(long const&) [2]
[15]     0.0    0.00    0.00 1000001         std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::allocator<__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> > >::destroy(__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >*) [19]
[16]     0.0    0.00    0.00 1000000         __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref() [16]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::stringstream::stringstream(__rw::__rw_openmode) [5]
[17]     0.0    0.00    0.00 1000000         std::stringstream::rdbuf() const [17]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::iostream::~iostream() [4]
[18]     0.0    0.00    0.00 1000000         std::istream::~istream() [18]
-----------------------------------------------
{noformat}
and index by function name:
{noformat}
   [2] std::string lex_cast<std::string, long>(long const&)
   [4] std::iostream::~iostream()
   [8] std::ios::~ios()
  [16] __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref()
  [18] std::istream::~istream()
   [9] data_start
  [17] std::stringstream::rdbuf() const
   [7] std::ostream::~ostream()
   [1] main
  [15] std::allocator<char>::allocator()
   [5] std::stringstream::stringstream(__rw::__rw_openmode)
   [6] std::iostream::iostream(std::streambuf*)
   [3] std::stringstream::~stringstream()
{noformat}
  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 2.5h
>  Remaining Estimate: 9.5h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594770#action_12594770 ] 

Martin Sebor commented on STDCXX-914:
-------------------------------------

gprof flat profile for a 15D build with gcc 4.3.0 on x86_64:
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
 33.40      0.08     0.08  1000000    80.15   200.37  std::string lex_cast<std::string, long>(long const&)
 16.70      0.12     0.04  1000000    40.07    90.17  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream()
 16.70      0.16     0.04                             main
 12.52      0.19     0.03  1000000    30.06    40.07  std::basic_iostream<char, std::char_traits<char> >::~basic_iostream()
  8.35      0.21     0.02  1000000    20.04    20.04  std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*)
  4.17      0.22     0.01  1000000    10.02    10.02  std::basic_ostream<char, std::char_traits<char> >::~basic_ostream()
  4.17      0.23     0.01  1000000    10.02    30.06  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode)
  4.17      0.24     0.01  1000000    10.02    10.02  std::basic_ios<char, std::char_traits<char> >::~basic_ios()
  0.00      0.24     0.00 14000002     0.00     0.00  data_start
  0.00      0.24     0.00  1000001     0.00     0.00  std::allocator<char>::allocator()
  0.00      0.24     0.00  1000000     0.00     0.00  __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref()
  0.00      0.24     0.00  1000000     0.00     0.00  std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const
  0.00      0.24     0.00  1000000     0.00     0.00  std::basic_istream<char, std::char_traits<char> >::~basic_istream()
{noformat}

gprof call graph:
{noformat}
granularity: each sample hit covers 2 byte(s) for 4.16% of 0.24 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.04    0.20                 main [1]
                0.08    0.12 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
                0.00    0.00 3000002/14000002     data_start [9]
                0.00    0.00       1/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.08    0.12 1000000/1000000     main [1]
[2]     83.3    0.08    0.12 1000000         std::string lex_cast<std::string, long>(long const&) [2]
                0.04    0.05 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.01    0.02 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 5000000/14000002     data_start [9]
                0.00    0.00 1000000/1000001     std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.04    0.05 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[3]     37.5    0.04    0.05 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.03    0.01 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
                0.01    0.00 1000000/1000000     std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.03    0.01 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
[4]     16.7    0.03    0.01 1000000         std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
                0.01    0.00 1000000/1000000     std::basic_ostream<char, std::char_traits<char> >::~basic_ostream() [7]
                0.00    0.00 1000000/1000000     std::basic_istream<char, std::char_traits<char> >::~basic_istream() [18]
-----------------------------------------------
                0.01    0.02 1000000/1000000     std::string lex_cast<std::string, long>(long const&) [2]
[5]     12.5    0.01    0.02 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.02    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
                0.00    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const [17]
-----------------------------------------------
                0.02    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
[6]      8.3    0.02    0.00 1000000         std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     data_start [9]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
[7]      4.2    0.01    0.00 1000000         std::basic_ostream<char, std::char_traits<char> >::~basic_ostream() [7]
-----------------------------------------------
                0.01    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
[8]      4.2    0.01    0.00 1000000         std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     data_start [9]
-----------------------------------------------
                0.00    0.00 1000000/14000002     std::basic_ios<char, std::char_traits<char> >::~basic_ios() [8]
                0.00    0.00 1000000/14000002     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream() [3]
                0.00    0.00 2000000/14000002     std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [6]
                0.00    0.00 2000000/14000002     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
                0.00    0.00 3000002/14000002     main [1]
                0.00    0.00 5000000/14000002     std::string lex_cast<std::string, long>(long const&) [2]
[9]      0.0    0.00    0.00 14000002         data_start [9]
-----------------------------------------------
                0.00    0.00       1/1000001     main [1]
                0.00    0.00 1000000/1000001     std::string lex_cast<std::string, long>(long const&) [2]
[15]     0.0    0.00    0.00 1000001         std::allocator<char>::allocator() [15]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::allocator<__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> > >::destroy(__rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >*) [19]
[16]     0.0    0.00    0.00 1000000         __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref() [16]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode) [5]
[17]     0.0    0.00    0.00 1000000         std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const [17]
-----------------------------------------------
                0.00    0.00 1000000/1000000     std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [4]
[18]     0.0    0.00    0.00 1000000         std::basic_istream<char, std::char_traits<char> >::~basic_istream() [18]
-----------------------------------------------
{noformat}
and index by function name:
{noformat}
   [2] std::string lex_cast<std::string, long>(long const&) [4] std::basic_iostream<char, std::char_traits<char> >::~basic_iostream() [8] std::basic_ios<char, std::char_traits<char> >::~basic_ios()
  [16] __rw::__string_ref<char, std::char_traits<char>, std::allocator<char> >::~__string_ref() [18] std::basic_istream<char, std::char_traits<char> >::~basic_istream() [9] data_start
  [17] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::rdbuf() const [7] std::basic_ostream<char, std::char_traits<char> >::~basic_ostream() [1] main
  [15] std::allocator<char>::allocator() [5] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(__rw::__rw_openmode)
   [6] std::basic_iostream<char, std::char_traits<char> >::basic_iostream(std::basic_streambuf<char, std::char_traits<char> >*) [3] std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream()
{noformat}

> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Sebor updated STDCXX-914:
--------------------------------

    Attachment: stdcxx-914-gprof-gcc-4.1.2-12D.txt

Renamed the attachment {{stdcxx-914.gprof}} to {{stdcxx-914-gprof-gcc-4.1.2-12D.txt}} to indicate compiler and build type. This gprof output is missing data for the library.

> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 1h
>  Remaining Estimate: 11h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes

Posted by "Martin Sebor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646015#action_12646015 ] 

sebor edited comment on STDCXX-914 at 11/8/08 4:18 PM:
--------------------------------------------------------------

Here are the top 10 functions from the {{[stdcxx-914-gprof-gcc-4.3.0-12S.txt|https://issues.apache.org/jira/secure/attachment/12393570/stdcxx-914-gprof-gcc-4.3.0-12S.txt]}} attachment (with template argument lists removed for better readability). Looks like the {{_C_is_managed()}} function could stand to be optimized...

{noformat}
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 32.40      2.99     2.99 30000000     0.00     0.00  __rw::__rw_locale::_C_is_managed(int) const
 13.98      4.28     1.29 50000000     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
  6.83      4.91     0.63 10000000     0.00     0.00  std::istream& std::operator>>(std::istream&, std::string&)
  4.98      5.37     0.46 10000001     0.00     0.00  std::string::operator=(std::string const&)
  4.98      5.83     0.46 10000000     0.00     0.00  std::ostream& __rw::__rw_insert(std::ostream&, long)
  4.66      6.26     0.43 10000000     0.00     0.00  std::num_put::_C_put(std::ostreambuf_iterator, std::ios_base&, char, int, void const*) const
  3.36      6.57     0.31 10000000     0.00     0.00  std::string lex_cast<std::string, long>(long const&)
  3.25      6.87     0.30 10000000     0.00     0.00  __rw::__rw_dtoa(char*, unsigned long, unsigned int)
  3.03      7.15     0.28                             main
  2.49      7.38     0.23 30000000     0.00     0.00  std::locale::~locale()
  ...
{noformat}

      was (Author: sebor):
    Here are the top 10 functions from the {{[stdcxx-914-gprof-gcc-4.3.0-12S.txt|https://issues.apache.org/jira/secure/attachment/12393570/stdcxx-914-gprof-gcc-4.3.0-12S.txt]}} attachment. Looks like the {{_C_is_managed()}} function could stand to be optimized...

{noformat}
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 32.40      2.99     2.99 30000000     0.00     0.00  __rw::__rw_locale::_C_is_managed(int) const
 13.98      4.28     1.29 50000000     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
  6.83      4.91     0.63 10000000     0.00     0.00  std::istream& std::operator>><char, std::char_traits<char>, std::allocator<char> >(std::istream&, std::string&)
  4.98      5.37     0.46 10000001     0.00     0.00  std::string::operator=(std::string const&)
  4.98      5.83     0.46 10000000     0.00     0.00  std::ostream& __rw::__rw_insert<char, std::char_traits<char>, long>(std::ostream&, long)
  4.66      6.26     0.43 10000000     0.00     0.00  std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, int, void const*) const
  3.36      6.57     0.31 10000000     0.00     0.00  std::string lex_cast<std::string, long>(long const&)
  3.25      6.87     0.30 10000000     0.00     0.00  __rw::__rw_dtoa(char*, unsigned long, unsigned int)
  3.03      7.15     0.28                             main
  2.49      7.38     0.23 30000000     0.00     0.00  std::locale::~locale()
  ...
{noformat}
  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 2.5h
>  Remaining Estimate: 9.5h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors in thread-safe builds are inefficient due to the initialization of the mutex data member in every stream, even in those that never use it. As soon as binary compatibility rules permit it we should remove the mutex and/or defer its initialization until it's needed. It might be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.