You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@trafficserver.apache.org by "John Plevyak (JIRA)" <ji...@apache.org> on 2009/11/14 19:52:39 UTC

[jira] Created: (TS-39) prior BZ59274 "fix" can result in a partition being cleared unnecessarily

prior BZ59274  "fix" can result in a partition being cleared unnecessarily
--------------------------------------------------------------------------

                 Key: TS-39
                 URL: https://issues.apache.org/jira/browse/TS-39
             Project: Traffic Server
          Issue Type: Bug
          Components: Cache
         Environment: All
            Reporter: John Plevyak
            Priority: Minor


The prior fix for BZ59274 clears the cache partition if recovery gets into a loop.  This can occur if the last_write_pos == skip + len
(the end of the cache partition).  This can occur because the code which updates wraps the write_pos does so when it attempts
the next write.  The solution is to check for this at the top of recover and wrap recovery. Also, the variable which the prior patch used
"prev_recover_pos" is stored in the CachePart when it is a purely local variable.  I would suggest leaving in the check (it doesn't hurt
if it never detects a problem).  Patch forthcoming.   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (TS-39) prior BZ59274 "fix" can result in a partition being cleared unnecessarily

Posted by "John Plevyak (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Plevyak updated TS-39:
---------------------------

    Attachment: cache-BZ59274.patch

This patch fixes the issue but leaves in the check for the infinite loop
and fixes some indentation and comments.

The FIXME comment is not correct as the lock is always held when
the write_pos is updated.  Also the ??? in the comment is unnecessary
as the comment is correct.

> prior BZ59274  "fix" can result in a partition being cleared unnecessarily
> --------------------------------------------------------------------------
>
>                 Key: TS-39
>                 URL: https://issues.apache.org/jira/browse/TS-39
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>         Environment: All
>            Reporter: John Plevyak
>            Priority: Minor
>         Attachments: cache-BZ59274.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The prior fix for BZ59274 clears the cache partition if recovery gets into a loop.  This can occur if the last_write_pos == skip + len
> (the end of the cache partition).  This can occur because the code which updates wraps the write_pos does so when it attempts
> the next write.  The solution is to check for this at the top of recover and wrap recovery. Also, the variable which the prior patch used
> "prev_recover_pos" is stored in the CachePart when it is a purely local variable.  I would suggest leaving in the check (it doesn't hurt
> if it never detects a problem).  Patch forthcoming.   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (TS-39) prior BZ59274 "fix" can result in a partition being cleared unnecessarily

Posted by "John Plevyak (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/TS-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777985#action_12777985 ] 

John Plevyak commented on TS-39:
--------------------------------


Here is the failure scenario:

... write document 1 to cache ...
Part::aggWrite we get unlucky and write_pos + agg_buf_pos + writelen == skip + len
Part::aggWriteDone, write_pos += write size,  last_write_pos = write_pos
CacheSync::mainEvent ... snap of header where write_pos == skip + len
.... write another document 2 to cache
Part::aggWrite
Part::agg_wrap() is called, write_pos = start
.. Sync complete ...
EXIT, disk now contains last_write_pos == start


.. recovery ..
initial recovery_pos == skip + len
read of 0 bytes
repeat
prev_recovery_pos == recovery_pos, cache partition cleared.

Also I was wrong about prev_recover_pos being local.

> prior BZ59274  "fix" can result in a partition being cleared unnecessarily
> --------------------------------------------------------------------------
>
>                 Key: TS-39
>                 URL: https://issues.apache.org/jira/browse/TS-39
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>         Environment: All
>            Reporter: John Plevyak
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The prior fix for BZ59274 clears the cache partition if recovery gets into a loop.  This can occur if the last_write_pos == skip + len
> (the end of the cache partition).  This can occur because the code which updates wraps the write_pos does so when it attempts
> the next write.  The solution is to check for this at the top of recover and wrap recovery. Also, the variable which the prior patch used
> "prev_recover_pos" is stored in the CachePart when it is a purely local variable.  I would suggest leaving in the check (it doesn't hurt
> if it never detects a problem).  Patch forthcoming.   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (TS-39) prior BZ59274 "fix" can result in a partition being cleared unnecessarily

Posted by "John Plevyak (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Plevyak reassigned TS-39:
------------------------------

    Assignee: John Plevyak

> prior BZ59274  "fix" can result in a partition being cleared unnecessarily
> --------------------------------------------------------------------------
>
>                 Key: TS-39
>                 URL: https://issues.apache.org/jira/browse/TS-39
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>         Environment: All
>            Reporter: John Plevyak
>            Assignee: John Plevyak
>            Priority: Minor
>         Attachments: cache-BZ59274.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The prior fix for BZ59274 clears the cache partition if recovery gets into a loop.  This can occur if the last_write_pos == skip + len
> (the end of the cache partition).  This can occur because the code which updates wraps the write_pos does so when it attempts
> the next write.  The solution is to check for this at the top of recover and wrap recovery. Also, the variable which the prior patch used
> "prev_recover_pos" is stored in the CachePart when it is a purely local variable.  I would suggest leaving in the check (it doesn't hurt
> if it never detects a problem).  Patch forthcoming.   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (TS-39) prior BZ59274 "fix" can result in a partition being cleared unnecessarily

Posted by "John Plevyak (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/TS-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Plevyak closed TS-39.
--------------------------

    Resolution: Fixed

> prior BZ59274  "fix" can result in a partition being cleared unnecessarily
> --------------------------------------------------------------------------
>
>                 Key: TS-39
>                 URL: https://issues.apache.org/jira/browse/TS-39
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>         Environment: All
>            Reporter: John Plevyak
>            Assignee: John Plevyak
>            Priority: Minor
>         Attachments: cache-BZ59274.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The prior fix for BZ59274 clears the cache partition if recovery gets into a loop.  This can occur if the last_write_pos == skip + len
> (the end of the cache partition).  This can occur because the code which updates wraps the write_pos does so when it attempts
> the next write.  The solution is to check for this at the top of recover and wrap recovery. Also, the variable which the prior patch used
> "prev_recover_pos" is stored in the CachePart when it is a purely local variable.  I would suggest leaving in the check (it doesn't hurt
> if it never detects a problem).  Patch forthcoming.   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.