You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2016/08/15 14:24:20 UTC

[jira] [Comment Edited] (STORM-2038) Provide an alternative to using symlinks

    [ https://issues.apache.org/jira/browse/STORM-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421036#comment-15421036 ] 

Robert Joseph Evans edited comment on STORM-2038 at 8/15/16 2:23 PM:
---------------------------------------------------------------------

Giving a canonical path to the worker artifacts should be a fairly simple solution.  We were doing it previously for the logs dir anyways, it should be simple to extend this and just disable the symlink when configured to do so.

For the blob store we have a bit of a bigger problem.  The [Localizer|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/localizer/Localizer.java] sets up a chain of symlinks so that the old data downloaded from the blob store can remain in place until the new data is downloaded and ready.  At that point it will update one of the sym-links in the chain to atomically point it to the new location of the data.  There is some redundancy in the links that we could probably remove, but the path currently stands as

{code}
${worker_pwd}/${link_name} -> ${topology_code_dir}/${link_name} -> ${localizer_cache}/${user}/.../${key}.current -> ${localizer_cache}/${user}/.../${key}.${version}
{code}

If we removed all of the symlinks in some cases we would need another way/API for the user to be able to get the current list of blob paths to access.  We currently don't have a communication path from the supervisor to the worker.  We would need to add this in, along with some book keeping so we can know which blob version is the current one.  We don't always rely on the version number to be atomically incrementing, just different from what we already have cached.  Any high level API that we do add, would need to work both with sym-links and without sym-links consistently.  Essentially it would need two implementations one that relies on sym-links so when a sym-link changes the API returns the correct thing, and another that just reads from this new communication path.

There are a number of other features in the works that build on top of this functionality that would also need some rework.  STORM-2016 takes jars on the client and adds them to the blobstore/classpath for the worker (removes the requirement for an uber-jar).

I also know that [~jerrypeng] has been working on a few things that would allow you to change configs as part of a topology rebalance, although it is very preliminary.  It also has the potential to also update a topology's jar, or combined with STORM-2016 a dependency of a topology and upgrade the topology on the fly without actually relaunching it.

None of this makes this work impossible, just not trivial.


was (Author: revans2):
Giving a canonical path to the worker artifacts should be a fairly simple solution.  We were doing it previously for the logs dir anyways, it should be simple to extend this and just disable the symlink when configured to do so.

For the blob store we have a bit of a bigger problem.  The [Localizer|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/localizer/Localizer.java] sets up a chain of symlinks so that the old data downloaded from the blob store can remain in place until the new data is downloaded and ready.  At that point it will update one of the sym-links in the chain to atomically point it to the new location of the data.  There is some redundancy in the links that we could probably remove, but the path currently stands as

{code}
${worker_pwd}/link_name -> ${topology_code_dir}/link_name -> ${localizer_cache}/${user}/.../${key}.current -> ${localizer_cache}/${user}/.../${key}.${version}
{code}

If we removed all of the symlinks in some cases we would need another way/API for the user to be able to get the current list of blob paths to access.  We currently don't have a communication path from the supervisor to the worker.  We would need to add this in, along with some book keeping so we can know which blob version is the current one.  We don't always rely on the version number to be atomically incrementing, just different from what we already have cached.  Any high level API that we do add, would need to work both with sym-links and without sym-links consistently.  Essentially it would need two implementations one that relies on sym-links so when a sym-link changes the API returns the correct thing, and another that just reads from this new communication path.

There are a number of other features in the works that build on top of this functionality that would also need some rework.  STORM-2016 takes jars on the client and adds them to the blobstore/classpath for the worker (removes the requirement for an uber-jar).

I also know that [~jerrypeng] has been working on a few things that would allow you to change configs as part of a topology rebalance, although it is very preliminary.  It also has the potential to also update a topology's jar, or combined with STORM-2016 a dependency of a topology and upgrade the topology on the fly without actually relaunching it.

None of this makes this work impossible, just not trivial.

> Provide an alternative to using symlinks
> ----------------------------------------
>
>                 Key: STORM-2038
>                 URL: https://issues.apache.org/jira/browse/STORM-2038
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>    Affects Versions: 1.0.1
>         Environment: Any windows
>            Reporter: Paul Milliken
>              Labels: symlink, windows
>
> As of Storm 1.0 and above, some functionality (such as the worker-artifacts directory) require the use of symlinks. On Windows platforms, this requires that Storm either be run as an administrator or that certain group policy settings are changed.
> In locked-down environments, both of these solutions are not suitable.
> Where possible, an alternative option should be provided to the use of symlinks. For example, it may be possible to create additional copies of the worker artifacts directory for each worker (possibly inefficient) or provide the workers with the canonical path to the real directory.
> See the [brief discussion|http://mail-archives.apache.org/mod_mbox/storm-dev/201608.mbox/%3C1293850887.13165119.1471022901569.JavaMail.yahoo%40mail.yahoo.com%3E] on the mailing list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)