You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by santhosh venkat <sa...@gmail.com> on 2016/09/01 18:06:22 UTC

Periodic cleanup of unused local stores

Currently in Samza, to enable reuse of local store between restarts, local
store is persisted outside of the YARN’s working directory. However, there
is no mechanism currently available to periodically clean up the unused
local stores. Here is a proposal detailing a possible way to accomplish
this:

https://issues.apache.org/jira/secure/attachment/12826531/GCstalelocalstate.pdf

This is tracked in SAMZA-656. Any feedback/comments are welcome.

Thanks.

Re: Periodic cleanup of unused local stores

Posted by Navina Ramesh <nr...@linkedin.com.INVALID>.
Hi Santhosh,

Thanks for picking SAMZA-656. This is long overdue and will help make our
host-affinity based solution more robust. I have a couple of thoughts on
your design proposal.

1. It is always very useful to provide more context to the reader, esp. in
explaining what the different terms mean (like host-affinity, tombstone
etc) and how it relates to the problem being described.

2. "The Host Affinity feature in Samza enables it to restore local state
from disk instead of bootstrapping the entire changelog" -> host-affinity
as a features only tries to bring-up the container in the same host as
before. This will help samza leverage the locally persisted store data. It
doesn't actually help it restore state in anyway.

3. "To achieve this, Samza stores local state for change logged stores in a
shared directory so it is not tied to a resource manager’s storage
structure and cleanup schedule." -> I think by shared directory, you are
referring to the yarn application's workspace. This shared workspace is
part of the NM, not the RM. You can rephrase this and additionally, provide
the logical path to the state stores.

4. " Expose an API in samza­rest that" -> Can you elaborate what the API
looks like ?

5. Is the rest-api to be invoked by the monitor for all jobs in the cluster
or all running jobs ? What is the criteria there? Please do mention them,
if any.

Thanks!
Navina

On Thu, Sep 1, 2016 at 11:06 AM, santhosh venkat <
santhoshvenkat1988@gmail.com> wrote:

> Currently in Samza, to enable reuse of local store between restarts, local
> store is persisted outside of the YARN’s working directory. However, there
> is no mechanism currently available to periodically clean up the unused
> local stores. Here is a proposal detailing a possible way to accomplish
> this:
>
> https://issues.apache.org/jira/secure/attachment/
> 12826531/GCstalelocalstate.pdf
>
> This is tracked in SAMZA-656. Any feedback/comments are welcome.
>
> Thanks.
>



-- 
Navina R.

Re: Periodic cleanup of unused local stores

Posted by Navina R <na...@gmail.com>.
(Replying again since it seems to have bounced off most inboxes):

Hi Santhosh,

Thanks for picking SAMZA-656. This is long overdue and will help make our
host-affinity based solution more robust. I have a couple of thoughts on
your design proposal.

1. It is always very useful to provide more context to the reader, esp. in
explaining what the different terms mean (like host-affinity, tombstone
etc) and how it relates to the problem being described.

2. "The Host Affinity feature in Samza enables it to restore local state
from disk instead of bootstrapping the entire changelog" -> host-affinity
as a features only tries to bring-up the container in the same host as
before. This will help samza leverage the locally persisted store data. It
doesn't actually help it restore state in anyway.

3. "To achieve this, Samza stores local state for change logged stores in a
shared directory so it is not tied to a resource manager’s storage
structure and cleanup schedule." -> I think by shared directory, you are
referring to the yarn application's workspace. This shared workspace is
part of the NM, not the RM. You can rephrase this and additionally, provide
the logical path to the state stores.

4. " Expose an API in samza­rest that" -> Can you elaborate what the API
looks like ?

5. Is the rest-api to be invoked by the monitor for all jobs in the cluster
or all running jobs ? What is the criteria there? Please do mention them,
if any.

Thanks!
Navina

On Thu, Sep 1, 2016 at 11:06 AM, santhosh venkat <
santhoshvenkat1988@gmail.com> wrote:

> Currently in Samza, to enable reuse of local store between restarts, local
> store is persisted outside of the YARN’s working directory. However, there
> is no mechanism currently available to periodically clean up the unused
> local stores. Here is a proposal detailing a possible way to accomplish
> this:
>
> https://issues.apache.org/jira/secure/attachment/
> 12826531/GCstalelocalstate.pdf
>
> This is tracked in SAMZA-656. Any feedback/comments are welcome.
>
> Thanks.
>