You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@trafficserver.apache.org by "Leif Hedstrom (Created) (JIRA)" <ji...@apache.org> on 2012/01/17 22:27:40 UTC

[jira] [Created] (TS-1080) Assert under heavy load with logging enabled

Assert under heavy load with logging enabled
--------------------------------------------

                 Key: TS-1080
                 URL: https://issues.apache.org/jira/browse/TS-1080
             Project: Traffic Server
          Issue Type: Bug
          Components: Logging
            Reporter: Leif Hedstrom
            Priority: Critical
             Fix For: 3.1.2


Given enough load (in the 100,000 QPS or more), we run out of some sort of buffer space, with an assert of

{code}
#0  0x00002ba719d50285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00002ba719d51b9b in __GI_abort () at abort.c:91
#2  0x00000000006b561a in ink_die_die_die (retval=<optimized out>) at ink_error.cc:43
#3  ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=<optimized out>, message_format=<optimized out>, ap=0x7fff7275a7d8) at ink_error.cc:65
#4  0x00000000006b56a7 in ink_fatal (return_code=<optimized out>, message_format=<optimized out>) at ink_error.cc:73
#5  0x00000000006b4970 in _ink_assert (a=0x6fd380 "_num_flush_buffers[_open_flush_array] < FLUSH_ARRAY_SIZE", f=<optimized out>, l=96) at ink_assert.cc:44
#6  0x00000000005a8b34 in add_to_flush_queue (buffer=0x2ba7443ca970, this=0x22fb918) at LogObject.h:96
#7  LogObject::_checkout_write (this=0x22fb880, write_offset=0x7fff7275add8, bytes_needed=152) at LogObject.cc:455
#8  0x00000000005a8fd3 in LogObject::log (this=0x22fb880, lad=0x7fff7275b030, text_entry=0x0) at LogObject.cc:580
#9  0x000000000058e956 in log (lad=0x7fff7275b030, this=<optimized out>) at LogObject.h:475
#10 Log::access (lad=0x7fff7275b030) at Log.cc:1086
{code}

Increasing FLUSH_ARRAY_SIZE alleviates the problem, but really, we shouldn't end up in this situation at all.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (TS-1080) Assert under heavy load with logging enabled

Posted by "Leif Hedstrom (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom reassigned TS-1080:
---------------------------------

    Assignee: Leif Hedstrom
    
> Assert under heavy load with logging enabled
> --------------------------------------------
>
>                 Key: TS-1080
>                 URL: https://issues.apache.org/jira/browse/TS-1080
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Logging
>            Reporter: Leif Hedstrom
>            Assignee: Leif Hedstrom
>            Priority: Critical
>             Fix For: 3.1.4
>
>
> Given enough load (in the 100,000 QPS or more), we run out of some sort of buffer space, with an assert of
> {code}
> #0  0x00002ba719d50285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00002ba719d51b9b in __GI_abort () at abort.c:91
> #2  0x00000000006b561a in ink_die_die_die (retval=<optimized out>) at ink_error.cc:43
> #3  ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=<optimized out>, message_format=<optimized out>, ap=0x7fff7275a7d8) at ink_error.cc:65
> #4  0x00000000006b56a7 in ink_fatal (return_code=<optimized out>, message_format=<optimized out>) at ink_error.cc:73
> #5  0x00000000006b4970 in _ink_assert (a=0x6fd380 "_num_flush_buffers[_open_flush_array] < FLUSH_ARRAY_SIZE", f=<optimized out>, l=96) at ink_assert.cc:44
> #6  0x00000000005a8b34 in add_to_flush_queue (buffer=0x2ba7443ca970, this=0x22fb918) at LogObject.h:96
> #7  LogObject::_checkout_write (this=0x22fb880, write_offset=0x7fff7275add8, bytes_needed=152) at LogObject.cc:455
> #8  0x00000000005a8fd3 in LogObject::log (this=0x22fb880, lad=0x7fff7275b030, text_entry=0x0) at LogObject.cc:580
> #9  0x000000000058e956 in log (lad=0x7fff7275b030, this=<optimized out>) at LogObject.h:475
> #10 Log::access (lad=0x7fff7275b030) at Log.cc:1086
> {code}
> Increasing FLUSH_ARRAY_SIZE alleviates the problem, but really, we shouldn't end up in this situation at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TS-1080) Assert under heavy load with logging enabled

Posted by "Leif Hedstrom (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-1080:
------------------------------

    Fix Version/s:     (was: 3.1.2)
                   3.1.3

Moving out to 3.1.3 for now.
                
> Assert under heavy load with logging enabled
> --------------------------------------------
>
>                 Key: TS-1080
>                 URL: https://issues.apache.org/jira/browse/TS-1080
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Logging
>            Reporter: Leif Hedstrom
>            Priority: Critical
>             Fix For: 3.1.3
>
>
> Given enough load (in the 100,000 QPS or more), we run out of some sort of buffer space, with an assert of
> {code}
> #0  0x00002ba719d50285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00002ba719d51b9b in __GI_abort () at abort.c:91
> #2  0x00000000006b561a in ink_die_die_die (retval=<optimized out>) at ink_error.cc:43
> #3  ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=<optimized out>, message_format=<optimized out>, ap=0x7fff7275a7d8) at ink_error.cc:65
> #4  0x00000000006b56a7 in ink_fatal (return_code=<optimized out>, message_format=<optimized out>) at ink_error.cc:73
> #5  0x00000000006b4970 in _ink_assert (a=0x6fd380 "_num_flush_buffers[_open_flush_array] < FLUSH_ARRAY_SIZE", f=<optimized out>, l=96) at ink_assert.cc:44
> #6  0x00000000005a8b34 in add_to_flush_queue (buffer=0x2ba7443ca970, this=0x22fb918) at LogObject.h:96
> #7  LogObject::_checkout_write (this=0x22fb880, write_offset=0x7fff7275add8, bytes_needed=152) at LogObject.cc:455
> #8  0x00000000005a8fd3 in LogObject::log (this=0x22fb880, lad=0x7fff7275b030, text_entry=0x0) at LogObject.cc:580
> #9  0x000000000058e956 in log (lad=0x7fff7275b030, this=<optimized out>) at LogObject.h:475
> #10 Log::access (lad=0x7fff7275b030) at Log.cc:1086
> {code}
> Increasing FLUSH_ARRAY_SIZE alleviates the problem, but really, we shouldn't end up in this situation at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (TS-1080) Assert under heavy load with logging enabled

Posted by "Leif Hedstrom (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom resolved TS-1080.
-------------------------------

    Resolution: Fixed

Fixed in commit 3691e5dca658cc59885f803cc70c5616591d8b23
                
> Assert under heavy load with logging enabled
> --------------------------------------------
>
>                 Key: TS-1080
>                 URL: https://issues.apache.org/jira/browse/TS-1080
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Logging
>            Reporter: Leif Hedstrom
>            Assignee: Leif Hedstrom
>            Priority: Critical
>             Fix For: 3.1.4
>
>
> Given enough load (in the 100,000 QPS or more), we run out of some sort of buffer space, with an assert of
> {code}
> #0  0x00002ba719d50285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00002ba719d51b9b in __GI_abort () at abort.c:91
> #2  0x00000000006b561a in ink_die_die_die (retval=<optimized out>) at ink_error.cc:43
> #3  ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=<optimized out>, message_format=<optimized out>, ap=0x7fff7275a7d8) at ink_error.cc:65
> #4  0x00000000006b56a7 in ink_fatal (return_code=<optimized out>, message_format=<optimized out>) at ink_error.cc:73
> #5  0x00000000006b4970 in _ink_assert (a=0x6fd380 "_num_flush_buffers[_open_flush_array] < FLUSH_ARRAY_SIZE", f=<optimized out>, l=96) at ink_assert.cc:44
> #6  0x00000000005a8b34 in add_to_flush_queue (buffer=0x2ba7443ca970, this=0x22fb918) at LogObject.h:96
> #7  LogObject::_checkout_write (this=0x22fb880, write_offset=0x7fff7275add8, bytes_needed=152) at LogObject.cc:455
> #8  0x00000000005a8fd3 in LogObject::log (this=0x22fb880, lad=0x7fff7275b030, text_entry=0x0) at LogObject.cc:580
> #9  0x000000000058e956 in log (lad=0x7fff7275b030, this=<optimized out>) at LogObject.h:475
> #10 Log::access (lad=0x7fff7275b030) at Log.cc:1086
> {code}
> Increasing FLUSH_ARRAY_SIZE alleviates the problem, but really, we shouldn't end up in this situation at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1080) Assert under heavy load with logging enabled

Posted by "Zhao Yongming (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/TS-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221534#comment-13221534 ] 

Zhao Yongming commented on TS-1080:
-----------------------------------

hmm, in this case we have fill up the accepting queue with the 512*4 buffers before the flush thread flushed the flush queue, if we don't increase FLUSH_ARRAY_SIZE, we should have 2 options:
1, drop the buffer
  this may be a good solution for me, as it is better than crashing.
2, speed up the flush thread
  well, we will run into another complex direction, we can increase flush thread? or how can we flush at a higher speed while the IO is limited?
                
> Assert under heavy load with logging enabled
> --------------------------------------------
>
>                 Key: TS-1080
>                 URL: https://issues.apache.org/jira/browse/TS-1080
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Logging
>            Reporter: Leif Hedstrom
>            Priority: Critical
>             Fix For: 3.1.4
>
>
> Given enough load (in the 100,000 QPS or more), we run out of some sort of buffer space, with an assert of
> {code}
> #0  0x00002ba719d50285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00002ba719d51b9b in __GI_abort () at abort.c:91
> #2  0x00000000006b561a in ink_die_die_die (retval=<optimized out>) at ink_error.cc:43
> #3  ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=<optimized out>, message_format=<optimized out>, ap=0x7fff7275a7d8) at ink_error.cc:65
> #4  0x00000000006b56a7 in ink_fatal (return_code=<optimized out>, message_format=<optimized out>) at ink_error.cc:73
> #5  0x00000000006b4970 in _ink_assert (a=0x6fd380 "_num_flush_buffers[_open_flush_array] < FLUSH_ARRAY_SIZE", f=<optimized out>, l=96) at ink_assert.cc:44
> #6  0x00000000005a8b34 in add_to_flush_queue (buffer=0x2ba7443ca970, this=0x22fb918) at LogObject.h:96
> #7  LogObject::_checkout_write (this=0x22fb880, write_offset=0x7fff7275add8, bytes_needed=152) at LogObject.cc:455
> #8  0x00000000005a8fd3 in LogObject::log (this=0x22fb880, lad=0x7fff7275b030, text_entry=0x0) at LogObject.cc:580
> #9  0x000000000058e956 in log (lad=0x7fff7275b030, this=<optimized out>) at LogObject.h:475
> #10 Log::access (lad=0x7fff7275b030) at Log.cc:1086
> {code}
> Increasing FLUSH_ARRAY_SIZE alleviates the problem, but really, we shouldn't end up in this situation at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TS-1080) Assert under heavy load with logging enabled

Posted by "Leif Hedstrom (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/TS-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240033#comment-13240033 ] 

Leif Hedstrom commented on TS-1080:
-----------------------------------

I agree, #1 is the "solution" for now. We might still want to bump up the FLUSH_ARRAY_SIZE a bit. You guys are familiar with the log stuff, is it doable to get in a fix to drop the buffers for v3.1.4?
                
> Assert under heavy load with logging enabled
> --------------------------------------------
>
>                 Key: TS-1080
>                 URL: https://issues.apache.org/jira/browse/TS-1080
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Logging
>            Reporter: Leif Hedstrom
>            Priority: Critical
>             Fix For: 3.1.4
>
>
> Given enough load (in the 100,000 QPS or more), we run out of some sort of buffer space, with an assert of
> {code}
> #0  0x00002ba719d50285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00002ba719d51b9b in __GI_abort () at abort.c:91
> #2  0x00000000006b561a in ink_die_die_die (retval=<optimized out>) at ink_error.cc:43
> #3  ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=<optimized out>, message_format=<optimized out>, ap=0x7fff7275a7d8) at ink_error.cc:65
> #4  0x00000000006b56a7 in ink_fatal (return_code=<optimized out>, message_format=<optimized out>) at ink_error.cc:73
> #5  0x00000000006b4970 in _ink_assert (a=0x6fd380 "_num_flush_buffers[_open_flush_array] < FLUSH_ARRAY_SIZE", f=<optimized out>, l=96) at ink_assert.cc:44
> #6  0x00000000005a8b34 in add_to_flush_queue (buffer=0x2ba7443ca970, this=0x22fb918) at LogObject.h:96
> #7  LogObject::_checkout_write (this=0x22fb880, write_offset=0x7fff7275add8, bytes_needed=152) at LogObject.cc:455
> #8  0x00000000005a8fd3 in LogObject::log (this=0x22fb880, lad=0x7fff7275b030, text_entry=0x0) at LogObject.cc:580
> #9  0x000000000058e956 in log (lad=0x7fff7275b030, this=<optimized out>) at LogObject.h:475
> #10 Log::access (lad=0x7fff7275b030) at Log.cc:1086
> {code}
> Increasing FLUSH_ARRAY_SIZE alleviates the problem, but really, we shouldn't end up in this situation at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira