You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@trafficserver.apache.org by GitBox <gi...@apache.org> on 2023/01/13 16:28:26 UTC

[GitHub] [trafficserver] bdgranger opened a new pull request, #9303: Issue 9274: read-while-writer readers can't find fragment inserted

bdgranger opened a new pull request, #9303:
URL: https://github.com/apache/trafficserver/pull/9303

   When last_collision is set, dir_probe starts at the bucket head and skips all collisions until last_collision is reached, then continues the probe from that point. dir_insert always inserts new entries immediately after the bucket head. So unless last_collision is the bucket head, any new entries inserted with last_collision set will be inserted before last_collision and will never be found by rww readers.
   
   Modify dir_insert to always insert new entries at the tail of the list. This will entail walking the entire list once on each insert. However, the lists should be entirely in memory and quite short, so this should not present too much performance hit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] zwoop commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by "zwoop (via GitHub)" <gi...@apache.org>.
zwoop commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1439364275

   Cherry-picked to v9.2.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] bdgranger commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by GitBox <gi...@apache.org>.
bdgranger commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1387455613

   @masaori335 the WARNINGs happened even before this code change, and are actually to be expected based on the  regression test implementation -- the "insert-delete" test intentionally inserts a Dir entry into every bucket to measure insert and probe rates, but never frees any of the entries. So when the next loop after the "probe rate" message is printed attempts to insert additional entries, it will not be able to find any and it needs to purge due to the directory overflow.  Does the test need to be modified to free some or all of the entries at the end of the insert-delete test, or is it okay to leave those warnings and see that the system deals with the condition appropriately? Either way, that is probably separate from this PR.
   
   However, the hang does appear to be from the code change.  It appears that during the corrupt_bucket test loop, a bucket is selected that was already intentionally corrupted from one of the previous iterations (seems strange that with only 10 loops we would end up back at the same bucket with a rand_CacheKey call), and then the new code that attempts to walk to the end of the list to do the insert gets stuck in an infinite loop.  Need to find the best way to add some protection against that without destroying performance.
   
   Is there any way to stop the build that is still spinning since 3 days ago? I don't appear to have the Jenkins permissions to do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] masaori335 commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by GitBox <gi...@apache.org>.
masaori335 commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1396492679

   I realized the WARNING shows up with debug build on the master branch. (I was trying the test with the release build). Thanks for pointing that out. I agree this should be done in a separate PR.
   
   Let's figure out to avoid the hang. @SolidWallOfCode, any idea?
   
   Evan stopped the build. I got the privileges, so if we got the hang again, I'd stop them next time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] ezelkow1 commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by GitBox <gi...@apache.org>.
ezelkow1 commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1396214730

   [approve ci]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] SolidWallOfCode commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by "SolidWallOfCode (via GitHub)" <gi...@apache.org>.
SolidWallOfCode commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1415991104

   Given the general size of caches these days, dropping a segment isn't that much of the cache. The general view of the logic is that since it's a cache, none of the data is essential and evicting it is acceptable. The lost fragments will eventually be cleaned up when the cache wraps back to that location.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] zwoop commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by "zwoop (via GitHub)" <gi...@apache.org>.
zwoop commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1416791751

   Would this be a candidate for 9.2.x?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] masaori335 merged pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by "masaori335 (via GitHub)" <gi...@apache.org>.
masaori335 merged PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] masaori335 commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by GitBox <gi...@apache.org>.
masaori335 commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1384647911

   Thanks for addressing the issue. It looks like a regression test, Cache_dir, made warnings and hung with this change. 
   Please check the test.
   
   - Hanging CI Jobs
     - https://ci.trafficserver.apache.org/job/Github_Builds/job/debian/597/console
     - https://ci.trafficserver.apache.org/job/Github_Builds/job/rocky/1246/console
     - https://ci.trafficserver.apache.org/job/Github_Builds/job/fedora/607/console
   
   - Cache_dir regression test
   https://github.com/apache/trafficserver/blob/acae44903ae30d5af57d3d2071b24239b0adffe4/iocore/cache/CacheDir.cc#L1414
   
   - Run specific regression test.
   ```
   ยป /opt/ats-review-1/bin/traffic_server -K -R 1 -r Cache_dir
   Traffic Server 10.0.0 Jan 16 2023 11:42:32 Masaoris-MBP-16
   traffic_server: using root directory '/opt/ats-review-1'
   [Jan 17 07:48:57.329] [TS_MAIN] NOTE: Traffic Server is running unprivileged, not switching to user 'nobody'
   REGRESSION_TEST initialization begun
   REGRESSION TEST Cache_dir started
   RPRINT Cache_dir: clearing vol 0
   RPRINT Cache_dir: insert test
   RPRINT Cache_dir: free: 25098
   RPRINT Cache_dir: inserted: 25098
   RPRINT Cache_dir: delete test
   RPRINT Cache_dir: newfree: 25098
   RPRINT Cache_dir: insert-delete test
   RPRINT Cache_dir: insert rate = 6910242 / second
   RPRINT Cache_dir: probe rate = 8302348 / second
   [Jan 17 07:49:01.222] [ET_NET 0] WARNING: cache directory overflow on '/opt/ats-review-1/var/trafficserver/cache.db' segment 0, purging...
   [Jan 17 07:49:01.224] [ET_NET 0] WARNING: cache directory overflow on '/opt/ats-review-1/var/trafficserver/cache.db' segment 0, purging...
   [Jan 17 07:49:01.226] [ET_NET 0] WARNING: cache directory overflow on '/opt/ats-review-1/var/trafficserver/cache.db' segment 0, purging...
   [Jan 17 07:49:01.228] [ET_NET 0] WARNING: cache directory overflow on '/opt/ats-review-1/var/trafficserver/cache.db' segment 0, purging...
   [Jan 17 07:49:01.231] [ET_NET 0] WARNING: cache directory overflow on '/opt/ats-review-1/var/trafficserver/cache.db' segment 0, purging...
   [Jan 17 07:49:01.233] [ET_NET 0] WARNING: cache directory overflow on '/opt/ats-review-1/var/trafficserver/cache.db' segment 0, purging...
   [Jan 17 07:49:01.236] [ET_NET 0] WARNING: cache directory overflow on '/opt/ats-review-1/var/trafficserver/cache.db' segment 0, purging...
   [Jan 17 07:49:01.238] [ET_NET 0] WARNING: cache directory overflow on '/opt/ats-review-1/var/trafficserver/cache.db' segment 0, purging...
   RPRINT Cache_dir: corrupt_bucket test
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] bdgranger commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by GitBox <gi...@apache.org>.
bdgranger commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1397651927

   I tried putting a loop counter into the dir_insert loop and set the value to 100, to match what was already in dir_bucket_length, however the regression test still failed because it is coded to add every single directory entry to the same bucket. It is certainly not expected that any given bucket will have over 100 entries in it, but in order to pass the test I will probably need to set the loop counter to something like 65536 in order to match the highest possible number of directory entries available in a segment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] masaori335 commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by GitBox <gi...@apache.org>.
masaori335 commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1396219356

   [approve ci fedora]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] masaori335 commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by GitBox <gi...@apache.org>.
masaori335 commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1383344608

   [approve ci]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [trafficserver] bdgranger commented on pull request #9303: Issue 9274: read-while-writer readers can't find fragment inserted

Posted by GitBox <gi...@apache.org>.
bdgranger commented on PR #9303:
URL: https://github.com/apache/trafficserver/pull/9303#issuecomment-1397091524

   I just need to add a loop counter in the list-walking code similar to what is in the dir_bucket_length() function to break out of the loop if it seems the list is getting too long.  Under normal circumstances this should never be hit, but it's good to have a failsafe.
   
   The follow-on question to that is what to do if we encounter a problem while walking the list? Do we attempt to "fix" the corrupt bucket by calling dir_bucket_loop_fix? That would restore the bucket but would do so by reinitializing the entire segment and invalidating a large portion of the cache.
   
   Another option would be simply to insert the new entry at the point that we broke out of the loop. This would resolve the loop by effectively null-terminating the list again, but would "strand" any directory entries after that point in the list so that they are not returned to the freelist and are no longer referenced in any linked list. This would make those entries unusable until some sort of directory structure reset in the future...
   
   Or I suppose we could treat it as an error case and simply return 0 without inserting the directory entry into the bucket. But nowhere else does dir_insert return 0, so I don't know how the caller would handle it.
   
   I will note that dir_probe also does not have loop protection in it, unless the code is compiled in Debug mode, so it's obvious that this type of bucket loop corruption must almost never happen or people would have noticed and complained about it long ago.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org