You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "prateekm (via GitHub)" <gi...@apache.org> on 2023/03/06 19:57:58 UTC

[GitHub] [samza] prateekm opened a new pull request, #1654: Side Inputs + Blob Store Backups - Part 1 - Fix duplicate uploads and broken metrics

prateekm opened a new pull request, #1654:
URL: https://github.com/apache/samza/pull/1654

   Part 1 of 2. Follow up PR #2 to restore side input stores using Blob Store backups coming soon.
   
   Symptoms: 
   1. Side input stores are uploaded twice when using Blob Store State Backend.
   2. Store-level Gauges (but not Timers) in BlobStoreBackupManagerMetrics are broken for side input stores.
   3. Task level Gauges in BlobStoreBackupManagerMetrics have incorrect value (count twice for side input stores).
   
   Cause: 
   1. StorageConfig#getStoreNames() returns side input stores twice in the list. 
   2. BlobStoreBackupManager does not dedup storesToBackup list.
   3. PR #1223 makes the duplicate-registration behavior between Gauges and Timers inconsistent.
    
   Changes: 
   1. Fixed StorageConfig#getStoreNames() to dedup store names.
   2. Added defensive dedup in BlobStoreBackupManager.
   3. Changed store level metrics initialization in BlobStoreBackupManagerMetrics to computeIfAbsent instead of putIfAbsent to avoid overwriting-yet-returning-old-Gauges in case of duplicate store names.
    
   Tests: 
   Added unit tests for StorageConfig to verify deduping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] prateekm merged pull request #1654: Side Inputs + Blob Store Backups - Part 1 - Fix duplicate uploads and broken metrics

Posted by "prateekm (via GitHub)" <gi...@apache.org>.
prateekm merged PR #1654:
URL: https://github.com/apache/samza/pull/1654


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] prateekm commented on pull request #1654: Side Inputs + Blob Store Backups - Part 1 - Fix duplicate uploads and broken metrics

Posted by "prateekm (via GitHub)" <gi...@apache.org>.
prateekm commented on PR #1654:
URL: https://github.com/apache/samza/pull/1654#issuecomment-1456876924

   cc @shekhars-li for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org