You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by "Bryan Call (JIRA)" <ji...@apache.org> on 2010/02/10 23:30:36 UTC

[jira] Created: (TS-162) inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads

inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads
------------------------------------------------------------------------------------------------------

                 Key: TS-162
                 URL: https://issues.apache.org/jira/browse/TS-162
             Project: Traffic Server
          Issue Type: Bug
          Components: Stats
    Affects Versions: 2.0.0a
            Reporter: Bryan Call


There is a problem with the current stats being inconsistent.  The problem is that the net threads each have their own stats which they modify and the aggregation thread also modifies the same stat value.  There is no proper synchronization for this to work correctly.  The aggregation thread does an atomic swap to zero the net threads stat value, but the local thread doesn't use atomics and only increments the value.  This will lead to inconsistencies...

A better design is to *only* have the net threads modify their local stats and have the aggregation thread handle totalling the values in a separate stat structure.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-162) inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads

Posted by "John Plevyak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848447#action_12848447 ] 

John Plevyak commented on TS-162:
---------------------------------

I think there is  a race condition in this code if two instances of this code run at the same time.  First,
there is a race condition between the read of last_XXX and the write via swap and a possible write
on another thread.  Also there is a race between the two swaps and two other swaps on a different thread.
I think these can both be solved by using a CAS to update the last_XXX values and then using the
difference to update the sum/count via atomic increment.  While it is still possible that the count
and sum will be out of sync, that is always a problem, but at least updates will not get lost.

The sync problem is solvable as well with an additional loop during read and sequence numbers
of (updates started, updates ended) in a CAS updated variable attached to the global.

> inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TS-162
>                 URL: https://issues.apache.org/jira/browse/TS-162
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Stats
>            Reporter: Bryan Call
>            Assignee: Bryan Call
>             Fix For: 2.0.0
>
>         Attachments: stats_bcall_001.diff
>
>
> There is a problem with the current stats being inconsistent.  The problem is that the net threads each have their own stats which they modify and the aggregation thread also modifies the same stat value.  There is no proper synchronization for this to work correctly.  The aggregation thread does an atomic swap to zero the net threads stat value, but the local thread doesn't use atomics and only increments the value.  This will lead to inconsistencies...
> A better design is to *only* have the net threads modify their local stats and have the aggregation thread handle totalling the values in a separate stat structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TS-162) inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads

Posted by "Bryan Call (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Call updated TS-162:
--------------------------

    Attachment: stats_bcall_001.diff

Fix so the thread that syncs the stats doesn't modify the local thread stats value.

> inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TS-162
>                 URL: https://issues.apache.org/jira/browse/TS-162
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Stats
>            Reporter: Bryan Call
>            Assignee: Bryan Call
>             Fix For: 2.0.0
>
>         Attachments: stats_bcall_001.diff
>
>
> There is a problem with the current stats being inconsistent.  The problem is that the net threads each have their own stats which they modify and the aggregation thread also modifies the same stat value.  There is no proper synchronization for this to work correctly.  The aggregation thread does an atomic swap to zero the net threads stat value, but the local thread doesn't use atomics and only increments the value.  This will lead to inconsistencies...
> A better design is to *only* have the net threads modify their local stats and have the aggregation thread handle totalling the values in a separate stat structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-162) inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads

Posted by "John Plevyak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848471#action_12848471 ] 

John Plevyak commented on TS-162:
---------------------------------

I think it might be easier if the total just increased monotonically and the last_XXX was set to total in clear.
Then the read would have to do a total-last.

> inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TS-162
>                 URL: https://issues.apache.org/jira/browse/TS-162
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Stats
>            Reporter: Bryan Call
>            Assignee: Bryan Call
>             Fix For: 2.0.0
>
>         Attachments: stats_bcall_001.diff
>
>
> There is a problem with the current stats being inconsistent.  The problem is that the net threads each have their own stats which they modify and the aggregation thread also modifies the same stat value.  There is no proper synchronization for this to work correctly.  The aggregation thread does an atomic swap to zero the net threads stat value, but the local thread doesn't use atomics and only increments the value.  This will lead to inconsistencies...
> A better design is to *only* have the net threads modify their local stats and have the aggregation thread handle totalling the values in a separate stat structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TS-162) inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-162:
-----------------------------

    Fix Version/s:     (was: 2.0.0a)
                   2.0.0

> inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TS-162
>                 URL: https://issues.apache.org/jira/browse/TS-162
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Stats
>            Reporter: Bryan Call
>            Assignee: Bryan Call
>             Fix For: 2.0.0
>
>
> There is a problem with the current stats being inconsistent.  The problem is that the net threads each have their own stats which they modify and the aggregation thread also modifies the same stat value.  There is no proper synchronization for this to work correctly.  The aggregation thread does an atomic swap to zero the net threads stat value, but the local thread doesn't use atomics and only increments the value.  This will lead to inconsistencies...
> A better design is to *only* have the net threads modify their local stats and have the aggregation thread handle totalling the values in a separate stat structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TS-162) inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-162:
-----------------------------

    Fix Version/s: 2.0.1
                       (was: 2.0.0)

Moving this out to 2.0.1 after discussing with Bryan, so we can get more testing done with these changes before release.

> inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TS-162
>                 URL: https://issues.apache.org/jira/browse/TS-162
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Stats
>            Reporter: Bryan Call
>            Assignee: Bryan Call
>             Fix For: 2.0.1
>
>         Attachments: stats_bcall_001.diff, stats_bcall_002.diff
>
>
> There is a problem with the current stats being inconsistent.  The problem is that the net threads each have their own stats which they modify and the aggregation thread also modifies the same stat value.  There is no proper synchronization for this to work correctly.  The aggregation thread does an atomic swap to zero the net threads stat value, but the local thread doesn't use atomics and only increments the value.  This will lead to inconsistencies...
> A better design is to *only* have the net threads modify their local stats and have the aggregation thread handle totalling the values in a separate stat structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TS-162) inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads

Posted by "Bryan Call (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Call updated TS-162:
--------------------------

    Attachment: stats_bcall_002.diff

Added a mutex around modifying the last and global values and updated the cache to use increment on the global stats.

> inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TS-162
>                 URL: https://issues.apache.org/jira/browse/TS-162
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Stats
>            Reporter: Bryan Call
>            Assignee: Bryan Call
>             Fix For: 2.0.0
>
>         Attachments: stats_bcall_001.diff, stats_bcall_002.diff
>
>
> There is a problem with the current stats being inconsistent.  The problem is that the net threads each have their own stats which they modify and the aggregation thread also modifies the same stat value.  There is no proper synchronization for this to work correctly.  The aggregation thread does an atomic swap to zero the net threads stat value, but the local thread doesn't use atomics and only increments the value.  This will lead to inconsistencies...
> A better design is to *only* have the net threads modify their local stats and have the aggregation thread handle totalling the values in a separate stat structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-162) inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads

Posted by "John Plevyak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848840#action_12848840 ] 

John Plevyak commented on TS-162:
---------------------------------

The uses of RecGetGLobalRawStatSum in Cache.cc AIO_Callback_handler::handle_disk_failure
should be incr.

iocore/aio/AIO.cc has thread-unsafe statistics: global non-volatile variables updated
with normal increment then copied into the global stats.

> inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TS-162
>                 URL: https://issues.apache.org/jira/browse/TS-162
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Stats
>            Reporter: Bryan Call
>            Assignee: Bryan Call
>             Fix For: 2.0.0
>
>         Attachments: stats_bcall_001.diff
>
>
> There is a problem with the current stats being inconsistent.  The problem is that the net threads each have their own stats which they modify and the aggregation thread also modifies the same stat value.  There is no proper synchronization for this to work correctly.  The aggregation thread does an atomic swap to zero the net threads stat value, but the local thread doesn't use atomics and only increments the value.  This will lead to inconsistencies...
> A better design is to *only* have the net threads modify their local stats and have the aggregation thread handle totalling the values in a separate stat structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-162) inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads

Posted by "John Plevyak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848762#action_12848762 ] 

John Plevyak commented on TS-162:
---------------------------------

Bryan suggested a mutex, and I think he is right.  Assuming that the reads from global and sum to global operations
are rare (not many times a second continuous) then a mutex is probably safest and clearest.

> inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TS-162
>                 URL: https://issues.apache.org/jira/browse/TS-162
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Stats
>            Reporter: Bryan Call
>            Assignee: Bryan Call
>             Fix For: 2.0.0
>
>         Attachments: stats_bcall_001.diff
>
>
> There is a problem with the current stats being inconsistent.  The problem is that the net threads each have their own stats which they modify and the aggregation thread also modifies the same stat value.  There is no proper synchronization for this to work correctly.  The aggregation thread does an atomic swap to zero the net threads stat value, but the local thread doesn't use atomics and only increments the value.  This will lead to inconsistencies...
> A better design is to *only* have the net threads modify their local stats and have the aggregation thread handle totalling the values in a separate stat structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TS-162) inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads

Posted by "Bryan Call (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Call updated TS-162:
--------------------------

    Affects Version/s:     (was: 2.0.0a)
        Fix Version/s: 2.0.0a
             Assignee: Bryan Call

> inconsistencies in the stats system - there are incorrect assumtions on sychronization between threads
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TS-162
>                 URL: https://issues.apache.org/jira/browse/TS-162
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Stats
>            Reporter: Bryan Call
>            Assignee: Bryan Call
>             Fix For: 2.0.0a
>
>
> There is a problem with the current stats being inconsistent.  The problem is that the net threads each have their own stats which they modify and the aggregation thread also modifies the same stat value.  There is no proper synchronization for this to work correctly.  The aggregation thread does an atomic swap to zero the net threads stat value, but the local thread doesn't use atomics and only increments the value.  This will lead to inconsistencies...
> A better design is to *only* have the net threads modify their local stats and have the aggregation thread handle totalling the values in a separate stat structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.