You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "john lilley (JIRA)" <ji...@apache.org> on 2014/06/27 17:35:26 UTC
[jira] [Commented] (YARN-896) Roll up for long-lived services in YARN

    [ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046065#comment-14046065 ] 

john lilley commented on YARN-896:
----------------------------------

Grreetings!  Arun pointed me to this JIRA to see if this could potentially meet our needs.  We are an ISV that currently ships a data-quality/integration suite running as a native YARN application.  We are finding several use cases that would benefit from being able to manage a per-node persistent service.  MapReduce has its “shuffle auxiliary service”, but it isn’t straightforward to add auxiliary services because they cannot be loaded from HDFS, so we’d have to manage the distribution of JARs across nodes (please tell me if I’m wrong here…).

This seems to be addressing a lot of the issues around persistent services, and frankly I'm out of my depth in this discussion.  But if you all can help me understand if this might help our situation, I'd be happy to have our team put shoulder to the wheel and help advance the development.  Please comment our contemplated use case and help me understand if this is the right place to be.

Our software doesn't use MapReduce.  It is a pure YARN application that is basically a peer to MapReduce.  There are a lot of reasons for this decision, but the main one is that we have a large code base that already executes data transformations in a single-server environment, and we wanted to produce a product without rewriting huge swaths of code.  Given that, our software takes care of many things usually delegated to MapReduce, including distributed sort/partition (i.e. "the shuffle").  However, MapReduce has a special place in the ecosystem, in that it creates an auxiliary service to handle the distribution of shuffle data to reducers.  It doesn't look like third-party apps have an easy time installing aux services.  The JARs for any such service must be in Hadoop's classpath on all nodes at startup, creating both a management issue and a trust/security issue.  Currently our software places temporary data into HDFS for this purpose, but we've found that HDFS has a huge overhead in terms of performance and file handles, even at low replication.  We desire to replace the use of HDFS with a lighter-weight service to manage temp files and distribute their data.


> Roll up for long-lived services in YARN
> ---------------------------------------
>
>                 Key: YARN-896
>                 URL: https://issues.apache.org/jira/browse/YARN-896
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Robert Joseph Evans
>
> YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers.
> This ticket is intended to
>  # discuss what is needed to support long lived processes
>  # track the resulting JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)