You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Shekhar Sharma (Jira)" <ji...@apache.org> on 2021/06/14 20:43:00 UTC

[jira] [Updated] (SAMZA-2657) Blob store backed state backup and restore

     [ https://issues.apache.org/jira/browse/SAMZA-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shekhar Sharma updated SAMZA-2657:
----------------------------------
    Attachment:     (was: SAMZA-Backup-Restore-Design-Doc.pdf)

> Blob store backed state backup and restore
> ------------------------------------------
>
>                 Key: SAMZA-2657
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2657
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Shekhar Sharma
>            Assignee: Shekhar Sharma
>            Priority: Major
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> *Problem:*
> **At LinkedIn we noticed jobs with large states takes a long time to restore (in the tune of hours) from kafka based changelog. 
> *Solution*: 
> **We propose a blob store based backup and restore for stateful jobs. Advantage of such a system is the ability to backup and restore state in parallel rather than one message at a time approach for a kafka based changelog. We implement a pluggable system that allows various blob stores that support PUT/GET/DELETE APIs to be easily plugged in as the backend for Samza state backup and restore.
> *Note:*
> At this time a general interface for Blob stores is provided for users and community to implement details of different blob store specific details. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)