You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/01/08 18:11:35 UTC

[jira] [Commented] (STORM-411) Extend file uploads to support more distributed cache like semantics

    [ https://issues.apache.org/jira/browse/STORM-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269586#comment-14269586 ] 

ASF GitHub Bot commented on STORM-411:
--------------------------------------

Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/354#issuecomment-69212412
  
    From reading through the design document, my initial impressions are that we are coupling the Nimbus fail over and leader election too closely to having a persistent store for the data.  
    
    It feels to me like we want two different things.  One is a highly available store for blobs (we are working on an API for something similar to this for STORM-411) that we can write into it and query it to know that the blob has been persisted.  This could mean adequate replication, or whatever that blob store feels it needs.
    
    The second thing we want is leader election/fail over for nimbus.
    
    By separating the two, we can easily have something where we are running on YARN, and we only have one nimbus instance, but the data is stored in HDFS.  Nimbus crashes a new one comes up else where and everything should work just fine.  Or we are running on EC2 and we want nimbus to be hot/warm, but I don't want 3 instances of nimbus because it costs too much, just 2 so I store the data in S3 instead.  Exposing the replication count feels like it is an internal detail of storage system, that we really don't care that much about.
    
    It really feels like it would give us a lot more flexibility to have the storage API completely separate from the fail over and leader election code. 


> Extend file uploads to support more distributed cache like semantics
> --------------------------------------------------------------------
>
>                 Key: STORM-411
>                 URL: https://issues.apache.org/jira/browse/STORM-411
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>
> One of the big features that we are asked about for a hosted storm instance is how to distribute and update large shared data sets with topologies.  These could be things like ip to geolocation tables, machine learned models or just about anything else.
> Currently with storm you either have to package it as part of your topology jar, install in on the machine already, or access an external service to pull the data down.  Packaging it in the jar does not allow users to update the dataset without restarting their topologies, installing it on the machine will not work for a hosted storm solution, and pulling it form an external service without the supervisors being aware of it would mean it would be downloaded multiple times, and may not be cleaned up properly afterwards.
> I propose that instead we setup something similar to the distributed cache on Hadoop, but with a pluggable backend.  The APIs would be for a simple blobstore so they could be backed by local disk on nimbus, HDFS, swift, or even bittorrent.
> Adding new "files" to the blob store or downloading them would by default go through nimbus, but if an external store is properly configured direct access into the store could be used.
> The worker process would access the files through symlinks in the current working directory of the worker.  For posix systems when a new version of the file is made available the symlink would atomically be replaced by a new one pointing to the new version.  Windows does not support atomic replacement of a symlink so we should provide a simple library that will return resolved paths to be used, and can detect when the links have changed, but have some retry logic built in, if the symlink disappears in the middle.
> We are in the early stages of implementing this functionality and would like some feedback on the concepts before getting too far along. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)