You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/01/23 13:36:34 UTC

[jira] [Commented] (SPARK-1526) Running spark driver program from my local machine

    [ https://issues.apache.org/jira/browse/SPARK-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289190#comment-14289190 ] 

Sean Owen commented on SPARK-1526:
----------------------------------

This may be a little bold in closing, but there's been no activity and I do not see an actionable change here, but I think there is a fine workaround for this case. Yes it's a pretty fundamental property of Spark that the driver communicates a lot with the executors, and I can't see that changing. You can of course run the driver remotely; it's a matter of network config, and having enough network bandwidth to support however much communication your driver/executors need, which is not necessarily a lot. Finally, you can access resources like DBs from your executors too, of course. In fact that is probably more sensible than loading to the driver, then copying again to executors.

> Running spark driver program from my local machine
> --------------------------------------------------
>
>                 Key: SPARK-1526
>                 URL: https://issues.apache.org/jira/browse/SPARK-1526
>             Project: Spark
>          Issue Type: Wish
>          Components: Spark Core
>            Reporter: Idan Zalzberg
>
> Currently it seems that the design choice is that the driver program should be close network-wise to the worker and allow connections to be created from either side.
> This makes using Spark somewhat harder since when I develop locally I not only to package all my program, but also all it's local dependencies.
> let's say I have a local DB with names of files in HADOOP that I want to process with spark, now I need my local DB to be accessible from the cluster so it can fetch the file names in runtime.
> The driver program is an awesome thing, but it loses some of it's strength if you can't really run it anywhere.
> It seems to me that the problem is with the DAGScheduler that needs to be close to the worker, maybe it shouldn't be embedded in the driver then?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org