You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by abhinav chowdary <ab...@gmail.com> on 2014/02/25 18:59:55 UTC

Sharing SparkContext

Hi,
       I am looking for ways to share the sparkContext, meaning i need to
be able to perform multiple operations on the same spark context.

Below is code of a simple app i am testing

 def main(args: Array[String]) {
    println("Welcome to example application!")

    val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")

    println("Spark context created!")

    println("Creating RDD!")

Now once this context is created i want to access  this to submit multiple
jobs/operations

Any help is much appreciated

Thanks

Re: Sharing SparkContext

Posted by abhinav chowdary <ab...@gmail.com>.

Thank You Mayur

I will try Ooyala job server to begin with. Is there a way to load RDD
created via sparkContext into shark? Only reason i ask is my RDD is being
created from Cassandra (not Hadoop,  we are trying to get shark work with
Cassandra as well, having troubles with it when running in distributed
mode).


On Tue, Feb 25, 2014 at 10:30 AM, Mayur Rustagi <ma...@gmail.com>wrote:

> fair scheduler merely reorders tasks .. I think he is looking to run
> multiple pieces of code on a single context on demand from customers...if
> the code & order is decided then fair scheduler will ensure that all tasks
> get equal cluster time :)
>
>
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <
> ognen@nengoiksvelzud.com> wrote:
>
>>  Doesn't the fair scheduler solve this?
>> Ognen
>>
>>
>> On 2/25/14, 12:08 PM, abhinav chowdary wrote:
>>
>> Sorry for not being clear earlier
>> how do you want to pass the operations to the spark context?
>> this is partly what i am looking for . How to access the active spark
>> context and possible ways to pass operations
>>
>>  Thanks
>>
>>
>>
>>  On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <mayur.rustagi@gmail.com
>> > wrote:
>>
>>> how do you want to pass the operations to the spark context?
>>>
>>>
>>>  Mayur Rustagi
>>> Ph: +919632149971
>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>  https://twitter.com/mayur_rustagi
>>>
>>>
>>>
>>> On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <
>>> abhinav.chowdary@gmail.com> wrote:
>>>
>>>> Hi,
>>>>        I am looking for ways to share the sparkContext, meaning i need
>>>> to be able to perform multiple operations on the same spark context.
>>>>
>>>>  Below is code of a simple app i am testing
>>>>
>>>>   def main(args: Array[String]) {
>>>>     println("Welcome to example application!")
>>>>
>>>>      val sc = new SparkContext("spark://10.128.228.142:7077", "Simple
>>>> App")
>>>>
>>>>      println("Spark context created!")
>>>>
>>>>      println("Creating RDD!")
>>>>
>>>>  Now once this context is created i want to access  this to submit
>>>> multiple jobs/operations
>>>>
>>>>  Any help is much appreciated
>>>>
>>>>  Thanks
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>  --
>> Warm Regards
>> Abhinav Chowdary
>>
>>
>>
>


-- 
Warm Regards
Abhinav Chowdary

Re: Sharing SparkContext

Posted by abhinav chowdary <ab...@gmail.com>.

0.8.1 we used branch 0.8 and  pull request into our local repo. I remember
we have to deal with few issues but once we are thought that its working
great.
On Mar 10, 2014 6:51 PM, "Mayur Rustagi" <ma...@gmail.com> wrote:

> Which version of Spark  are you using?
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Mon, Mar 10, 2014 at 6:49 PM, abhinav chowdary <
> abhinav.chowdary@gmail.com> wrote:
>
>> for any one who is interested to know about job server from Ooyala.. we
>> started using it recently and been working great so far..
>> On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" <og...@nengoiksvelzud.com>
>> wrote:
>>
>>>  In that case, I must have misunderstood the following (from
>>> http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html).
>>> Apologies. Ognen
>>>
>>> "Inside a given Spark application (SparkContext instance), multiple
>>> parallel jobs can run simultaneously if they were submitted from separate
>>> threads. By "job", in this section, we mean a Spark action (e.g. save,
>>> collect) and any tasks that need to run to evaluate that action.
>>> Spark's scheduler is fully thread-safe and supports this use case to enable
>>> applications that serve multiple requests (e.g. queries for multiple
>>> users).
>>>
>>> By default, Spark's scheduler runs jobs in FIFO fashion. Each job is
>>> divided into "stages" (e.g. map and reduce phases), and the first job gets
>>> priority on all available resources while its stages have tasks to launch,
>>> then the second job gets priority, etc. If the jobs at the head of the
>>> queue don't need to use the whole cluster, later jobs can start to run
>>> right away, but if the jobs at the head of the queue are large, then later
>>> jobs may be delayed significantly.
>>>
>>> Starting in Spark 0.8, it is also possible to configure fair sharing
>>> between jobs. Under fair sharing, Spark assigns tasks between jobs in a
>>> "round robin" fashion, so that all jobs get a roughly equal share of
>>> cluster resources. This means that short jobs submitted while a long job is
>>> running can start receiving resources right away and still get good
>>> response times, without waiting for the long job to finish. This mode is
>>> best for multi-user settings.
>>>
>>> To enable the fair scheduler, simply set the spark.scheduler.mode to
>>> FAIR before creating a SparkContext:"
>>> On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
>>>
>>> fair scheduler merely reorders tasks .. I think he is looking to run
>>> multiple pieces of code on a single context on demand from customers...if
>>> the code & order is decided then fair scheduler will ensure that all tasks
>>> get equal cluster time :)
>>>
>>>
>>>
>>>  Mayur Rustagi
>>> Ph: +919632149971
>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>  https://twitter.com/mayur_rustagi
>>>
>>>
>>>
>>> On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <
>>> ognen@nengoiksvelzud.com> wrote:
>>>
>>>>  Doesn't the fair scheduler solve this?
>>>> Ognen
>>>>
>>>>
>>>> On 2/25/14, 12:08 PM, abhinav chowdary wrote:
>>>>
>>>> Sorry for not being clear earlier
>>>> how do you want to pass the operations to the spark context?
>>>> this is partly what i am looking for . How to access the active spark
>>>> context and possible ways to pass operations
>>>>
>>>>  Thanks
>>>>
>>>>
>>>>
>>>>  On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <
>>>> mayur.rustagi@gmail.com> wrote:
>>>>
>>>>> how do you want to pass the operations to the spark context?
>>>>>
>>>>>
>>>>>  Mayur Rustagi
>>>>> Ph: +919632149971
>>>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>>>  https://twitter.com/mayur_rustagi
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <
>>>>> abhinav.chowdary@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>        I am looking for ways to share the sparkContext, meaning i
>>>>>> need to be able to perform multiple operations on the same spark context.
>>>>>>
>>>>>>  Below is code of a simple app i am testing
>>>>>>
>>>>>>   def main(args: Array[String]) {
>>>>>>     println("Welcome to example application!")
>>>>>>
>>>>>>      val sc = new SparkContext("spark://10.128.228.142:7077",
>>>>>> "Simple App")
>>>>>>
>>>>>>      println("Spark context created!")
>>>>>>
>>>>>>      println("Creating RDD!")
>>>>>>
>>>>>>  Now once this context is created i want to access  this to submit
>>>>>> multiple jobs/operations
>>>>>>
>>>>>>  Any help is much appreciated
>>>>>>
>>>>>>  Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>> Warm Regards
>>>> Abhinav Chowdary
>>>>
>>>>
>>>>
>>>
>>>
>

Re: Sharing SparkContext

Posted by Mayur Rustagi <ma...@gmail.com>.

Which version of Spark  are you using?


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Mon, Mar 10, 2014 at 6:49 PM, abhinav chowdary <
abhinav.chowdary@gmail.com> wrote:

> for any one who is interested to know about job server from Ooyala.. we
> started using it recently and been working great so far..
> On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" <og...@nengoiksvelzud.com>
> wrote:
>
>>  In that case, I must have misunderstood the following (from
>> http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html).
>> Apologies. Ognen
>>
>> "Inside a given Spark application (SparkContext instance), multiple
>> parallel jobs can run simultaneously if they were submitted from separate
>> threads. By “job”, in this section, we mean a Spark action (e.g. save,
>> collect) and any tasks that need to run to evaluate that action. Spark’s
>> scheduler is fully thread-safe and supports this use case to enable
>> applications that serve multiple requests (e.g. queries for multiple
>> users).
>>
>> By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is
>> divided into “stages” (e.g. map and reduce phases), and the first job gets
>> priority on all available resources while its stages have tasks to launch,
>> then the second job gets priority, etc. If the jobs at the head of the
>> queue don’t need to use the whole cluster, later jobs can start to run
>> right away, but if the jobs at the head of the queue are large, then later
>> jobs may be delayed significantly.
>>
>> Starting in Spark 0.8, it is also possible to configure fair sharing
>> between jobs. Under fair sharing, Spark assigns tasks between jobs in a
>> “round robin” fashion, so that all jobs get a roughly equal share of
>> cluster resources. This means that short jobs submitted while a long job is
>> running can start receiving resources right away and still get good
>> response times, without waiting for the long job to finish. This mode is
>> best for multi-user settings.
>>
>> To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR
>>  before creating a SparkContext:"
>> On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
>>
>> fair scheduler merely reorders tasks .. I think he is looking to run
>> multiple pieces of code on a single context on demand from customers...if
>> the code & order is decided then fair scheduler will ensure that all tasks
>> get equal cluster time :)
>>
>>
>>
>>  Mayur Rustagi
>> Ph: +919632149971
>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>  https://twitter.com/mayur_rustagi
>>
>>
>>
>> On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <
>> ognen@nengoiksvelzud.com> wrote:
>>
>>>  Doesn't the fair scheduler solve this?
>>> Ognen
>>>
>>>
>>> On 2/25/14, 12:08 PM, abhinav chowdary wrote:
>>>
>>> Sorry for not being clear earlier
>>> how do you want to pass the operations to the spark context?
>>> this is partly what i am looking for . How to access the active spark
>>> context and possible ways to pass operations
>>>
>>>  Thanks
>>>
>>>
>>>
>>>  On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <
>>> mayur.rustagi@gmail.com> wrote:
>>>
>>>> how do you want to pass the operations to the spark context?
>>>>
>>>>
>>>>  Mayur Rustagi
>>>> Ph: +919632149971
>>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>>  https://twitter.com/mayur_rustagi
>>>>
>>>>
>>>>
>>>> On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <
>>>> abhinav.chowdary@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>        I am looking for ways to share the sparkContext, meaning i need
>>>>> to be able to perform multiple operations on the same spark context.
>>>>>
>>>>>  Below is code of a simple app i am testing
>>>>>
>>>>>   def main(args: Array[String]) {
>>>>>     println("Welcome to example application!")
>>>>>
>>>>>      val sc = new SparkContext("spark://10.128.228.142:7077", "Simple
>>>>> App")
>>>>>
>>>>>      println("Spark context created!")
>>>>>
>>>>>      println("Creating RDD!")
>>>>>
>>>>>  Now once this context is created i want to access  this to submit
>>>>> multiple jobs/operations
>>>>>
>>>>>  Any help is much appreciated
>>>>>
>>>>>  Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>  --
>>> Warm Regards
>>> Abhinav Chowdary
>>>
>>>
>>>
>>
>>

Re: Sharing SparkContext

Posted by abhinav chowdary <ab...@gmail.com>.

hdfs 1.0.4 but we primarily use Cassandra + Spark (calliope). I tested it
with both
 Are you using it with HDFS? What version of Hadoop? 1.0.4?
Ognen

On 3/10/14, 8:49 PM, abhinav chowdary wrote:

for any one who is interested to know about job server from Ooyala.. we
started using it recently and been working great so far..
On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" <og...@nengoiksvelzud.com> wrote:

>  In that case, I must have misunderstood the following (from
> http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html).
> Apologies. Ognen
>
> "Inside a given Spark application (SparkContext instance), multiple
> parallel jobs can run simultaneously if they were submitted from separate
> threads. By "job", in this section, we mean a Spark action (e.g. save,
> collect) and any tasks that need to run to evaluate that action. Spark's
> scheduler is fully thread-safe and supports this use case to enable
> applications that serve multiple requests (e.g. queries for multiple
> users).
>
> By default, Spark's scheduler runs jobs in FIFO fashion. Each job is
> divided into "stages" (e.g. map and reduce phases), and the first job gets
> priority on all available resources while its stages have tasks to launch,
> then the second job gets priority, etc. If the jobs at the head of the
> queue don't need to use the whole cluster, later jobs can start to run
> right away, but if the jobs at the head of the queue are large, then later
> jobs may be delayed significantly.
>
> Starting in Spark 0.8, it is also possible to configure fair sharing
> between jobs. Under fair sharing, Spark assigns tasks between jobs in a
> "round robin" fashion, so that all jobs get a roughly equal share of
> cluster resources. This means that short jobs submitted while a long job is
> running can start receiving resources right away and still get good
> response times, without waiting for the long job to finish. This mode is
> best for multi-user settings.
>
> To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before
> creating a SparkContext:"
> On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
>
> fair scheduler merely reorders tasks .. I think he is looking to run
> multiple pieces of code on a single context on demand from customers...if
> the code & order is decided then fair scheduler will ensure that all tasks
> get equal cluster time :)
>
>
>
>  Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>  https://twitter.com/mayur_rustagi
>
>
>
> On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <
> ognen@nengoiksvelzud.com> wrote:
>
>>  Doesn't the fair scheduler solve this?
>> Ognen
>>
>>
>> On 2/25/14, 12:08 PM, abhinav chowdary wrote:
>>
>> Sorry for not being clear earlier
>> how do you want to pass the operations to the spark context?
>> this is partly what i am looking for . How to access the active spark
>> context and possible ways to pass operations
>>
>>  Thanks
>>
>>
>>
>>  On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <mayur.rustagi@gmail.com
>> > wrote:
>>
>>> how do you want to pass the operations to the spark context?
>>>
>>>
>>>  Mayur Rustagi
>>> Ph: +919632149971
>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>  https://twitter.com/mayur_rustagi
>>>
>>>
>>>
>>> On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <
>>> abhinav.chowdary@gmail.com> wrote:
>>>
>>>> Hi,
>>>>        I am looking for ways to share the sparkContext, meaning i need
>>>> to be able to perform multiple operations on the same spark context.
>>>>
>>>>  Below is code of a simple app i am testing
>>>>
>>>>   def main(args: Array[String]) {
>>>>     println("Welcome to example application!")
>>>>
>>>>      val sc = new SparkContext("spark://10.128.228.142:7077", "Simple
>>>> App")
>>>>
>>>>      println("Spark context created!")
>>>>
>>>>      println("Creating RDD!")
>>>>
>>>>  Now once this context is created i want to access  this to submit
>>>> multiple jobs/operations
>>>>
>>>>  Any help is much appreciated
>>>>
>>>>  Thanks
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>  --
>> Warm Regards
>> Abhinav Chowdary
>>
>>
>>
>
>
-- 
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
-- Jamie Zawinski

Re: Sharing SparkContext

Posted by Ognen Duzlevski <og...@plainvanillagames.com>.

Are you using it with HDFS? What version of Hadoop? 1.0.4?
Ognen

On 3/10/14, 8:49 PM, abhinav chowdary wrote:
>
> for any one who is interested to know about job server from Ooyala.. 
> we started using it recently and been working great so far..
>
> On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" <ognen@nengoiksvelzud.com 
> <ma...@nengoiksvelzud.com>> wrote:
>
>     In that case, I must have misunderstood the following (from
>     http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html).
>     Apologies. Ognen
>
>     "Inside a given Spark application (SparkContext instance),
>     multiple parallel jobs can run simultaneously if they were
>     submitted from separate threads. By "job", in this section, we
>     mean a Spark action (e.g.|save|,|collect|) and any tasks that need
>     to run to evaluate that action. Spark's scheduler is fully
>     thread-safe and supports this use case to enable applications that
>     serve multiple requests (e.g. queries for multiple users).
>
>     By default, Spark's scheduler runs jobs in FIFO fashion. Each job
>     is divided into "stages" (e.g. map and reduce phases), and the
>     first job gets priority on all available resources while its
>     stages have tasks to launch, then the second job gets priority,
>     etc. If the jobs at the head of the queue don't need to use the
>     whole cluster, later jobs can start to run right away, but if the
>     jobs at the head of the queue are large, then later jobs may be
>     delayed significantly.
>
>     Starting in Spark 0.8, it is also possible to configure fair
>     sharing between jobs. Under fair sharing, Spark assigns tasks
>     between jobs in a "round robin" fashion, so that all jobs get a
>     roughly equal share of cluster resources. This means that short
>     jobs submitted while a long job is running can start receiving
>     resources right away and still get good response times, without
>     waiting for the long job to finish. This mode is best for
>     multi-user settings.
>
>     To enable the fair scheduler, simply set
>     the|spark.scheduler.mode|to|FAIR|before creating a SparkContext:"
>
>     On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
>>     fair scheduler merely reorders tasks .. I think he is looking to
>>     run multiple pieces of code on a single context on demand from
>>     customers...if the code & order is decided then fair scheduler
>>     will ensure that all tasks get equal cluster time :)
>>
>>
>>
>>     Mayur Rustagi
>>     Ph: +919632149971 <tel:%2B919632149971>
>>     h
>>     <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>     <http://www.sigmoidanalytics.com>
>>     https://twitter.com/mayur_rustagi
>>
>>
>>
>>     On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski
>>     <ognen@nengoiksvelzud.com <ma...@nengoiksvelzud.com>> wrote:
>>
>>         Doesn't the fair scheduler solve this?
>>         Ognen
>>
>>
>>         On 2/25/14, 12:08 PM, abhinav chowdary wrote:
>>>         Sorry for not being clear earlier
>>>         how do you want to pass the operations to the spark context?
>>>         this is partly what i am looking for . How to access the
>>>         active spark context and possible ways to pass operations
>>>
>>>         Thanks
>>>
>>>
>>>
>>>         On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi
>>>         <mayur.rustagi@gmail.com <ma...@gmail.com>>
>>>         wrote:
>>>
>>>             how do you want to pass the operations to the spark context?
>>>
>>>
>>>             Mayur Rustagi
>>>             Ph: +919632149971 <tel:%2B919632149971>
>>>             h
>>>             <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>             <http://www.sigmoidanalytics.com>
>>>             https://twitter.com/mayur_rustagi
>>>
>>>
>>>
>>>             On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary
>>>             <abhinav.chowdary@gmail.com
>>>             <ma...@gmail.com>> wrote:
>>>
>>>                 Hi,
>>>                        I am looking for ways to share the
>>>                 sparkContext, meaning i need to be able to perform
>>>                 multiple operations on the same spark context.
>>>
>>>                 Below is code of a simple app i am testing
>>>
>>>                  def main(args: Array[String]) {
>>>                 println("Welcome to example application!")
>>>
>>>                     val sc = new
>>>                 SparkContext("spark://10.128.228.142:7077
>>>                 <http://10.128.228.142:7077>", "Simple App")
>>>
>>>                 println("Spark context created!")
>>>
>>>                 println("Creating RDD!")
>>>
>>>                 Now once this context is created i want to access
>>>                  this to submit multiple jobs/operations
>>>
>>>                 Any help is much appreciated
>>>
>>>                 Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>         -- 
>>>         Warm Regards
>>>         Abhinav Chowdary
>>
>>
>

-- 
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski

Re: Sharing SparkContext

Posted by abhinav chowdary <ab...@gmail.com>.

for any one who is interested to know about job server from Ooyala.. we
started using it recently and been working great so far..
On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" <og...@nengoiksvelzud.com> wrote:

>  In that case, I must have misunderstood the following (from
> http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html).
> Apologies. Ognen
>
> "Inside a given Spark application (SparkContext instance), multiple
> parallel jobs can run simultaneously if they were submitted from separate
> threads. By "job", in this section, we mean a Spark action (e.g. save,
> collect) and any tasks that need to run to evaluate that action. Spark's
> scheduler is fully thread-safe and supports this use case to enable
> applications that serve multiple requests (e.g. queries for multiple
> users).
>
> By default, Spark's scheduler runs jobs in FIFO fashion. Each job is
> divided into "stages" (e.g. map and reduce phases), and the first job gets
> priority on all available resources while its stages have tasks to launch,
> then the second job gets priority, etc. If the jobs at the head of the
> queue don't need to use the whole cluster, later jobs can start to run
> right away, but if the jobs at the head of the queue are large, then later
> jobs may be delayed significantly.
>
> Starting in Spark 0.8, it is also possible to configure fair sharing
> between jobs. Under fair sharing, Spark assigns tasks between jobs in a
> "round robin" fashion, so that all jobs get a roughly equal share of
> cluster resources. This means that short jobs submitted while a long job is
> running can start receiving resources right away and still get good
> response times, without waiting for the long job to finish. This mode is
> best for multi-user settings.
>
> To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before
> creating a SparkContext:"
> On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
>
> fair scheduler merely reorders tasks .. I think he is looking to run
> multiple pieces of code on a single context on demand from customers...if
> the code & order is decided then fair scheduler will ensure that all tasks
> get equal cluster time :)
>
>
>
>  Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>  https://twitter.com/mayur_rustagi
>
>
>
> On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski <
> ognen@nengoiksvelzud.com> wrote:
>
>>  Doesn't the fair scheduler solve this?
>> Ognen
>>
>>
>> On 2/25/14, 12:08 PM, abhinav chowdary wrote:
>>
>> Sorry for not being clear earlier
>> how do you want to pass the operations to the spark context?
>> this is partly what i am looking for . How to access the active spark
>> context and possible ways to pass operations
>>
>>  Thanks
>>
>>
>>
>>  On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <mayur.rustagi@gmail.com
>> > wrote:
>>
>>> how do you want to pass the operations to the spark context?
>>>
>>>
>>>  Mayur Rustagi
>>> Ph: +919632149971
>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>  https://twitter.com/mayur_rustagi
>>>
>>>
>>>
>>> On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <
>>> abhinav.chowdary@gmail.com> wrote:
>>>
>>>> Hi,
>>>>        I am looking for ways to share the sparkContext, meaning i need
>>>> to be able to perform multiple operations on the same spark context.
>>>>
>>>>  Below is code of a simple app i am testing
>>>>
>>>>   def main(args: Array[String]) {
>>>>     println("Welcome to example application!")
>>>>
>>>>      val sc = new SparkContext("spark://10.128.228.142:7077", "Simple
>>>> App")
>>>>
>>>>      println("Spark context created!")
>>>>
>>>>      println("Creating RDD!")
>>>>
>>>>  Now once this context is created i want to access  this to submit
>>>> multiple jobs/operations
>>>>
>>>>  Any help is much appreciated
>>>>
>>>>  Thanks
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>  --
>> Warm Regards
>> Abhinav Chowdary
>>
>>
>>
>
>

Re: Sharing SparkContext

Posted by Ognen Duzlevski <og...@nengoiksvelzud.com>.

In that case, I must have misunderstood the following (from 
http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). 
Apologies. Ognen

"Inside a given Spark application (SparkContext instance), multiple 
parallel jobs can run simultaneously if they were submitted from 
separate threads. By “job”, in this section, we mean a Spark action 
(e.g.|save|,|collect|) and any tasks that need to run to evaluate that 
action. Spark’s scheduler is fully thread-safe and supports this use 
case to enable applications that serve multiple requests (e.g. queries 
for multiple users).

By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is 
divided into “stages” (e.g. map and reduce phases), and the first job 
gets priority on all available resources while its stages have tasks to 
launch, then the second job gets priority, etc. If the jobs at the head 
of the queue don’t need to use the whole cluster, later jobs can start 
to run right away, but if the jobs at the head of the queue are large, 
then later jobs may be delayed significantly.

Starting in Spark 0.8, it is also possible to configure fair sharing 
between jobs. Under fair sharing, Spark assigns tasks between jobs in a 
“round robin” fashion, so that all jobs get a roughly equal share of 
cluster resources. This means that short jobs submitted while a long job 
is running can start receiving resources right away and still get good 
response times, without waiting for the long job to finish. This mode is 
best for multi-user settings.

To enable the fair scheduler, simply set 
the|spark.scheduler.mode|to|FAIR|before creating a SparkContext:"

On 2/25/14, 12:30 PM, Mayur Rustagi wrote:
> fair scheduler merely reorders tasks .. I think he is looking to run 
> multiple pieces of code on a single context on demand from 
> customers...if the code & order is decided then fair scheduler will 
> ensure that all tasks get equal cluster time :)
>
>
>
> Mayur Rustagi
> Ph: +919632149971 <tel:%2B919632149971>
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com 
> <http://www.sigmoidanalytics.com>
> https://twitter.com/mayur_rustagi
>
>
>
> On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski 
> <ognen@nengoiksvelzud.com <ma...@nengoiksvelzud.com>> wrote:
>
>     Doesn't the fair scheduler solve this?
>     Ognen
>
>
>     On 2/25/14, 12:08 PM, abhinav chowdary wrote:
>>     Sorry for not being clear earlier
>>     how do you want to pass the operations to the spark context?
>>     this is partly what i am looking for . How to access the active
>>     spark context and possible ways to pass operations
>>
>>     Thanks
>>
>>
>>
>>     On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi
>>     <mayur.rustagi@gmail.com <ma...@gmail.com>> wrote:
>>
>>         how do you want to pass the operations to the spark context?
>>
>>
>>         Mayur Rustagi
>>         Ph: +919632149971 <tel:%2B919632149971>
>>         h
>>         <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>         <http://www.sigmoidanalytics.com>
>>         https://twitter.com/mayur_rustagi
>>
>>
>>
>>         On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary
>>         <abhinav.chowdary@gmail.com
>>         <ma...@gmail.com>> wrote:
>>
>>             Hi,
>>                    I am looking for ways to share the sparkContext,
>>             meaning i need to be able to perform multiple operations
>>             on the same spark context.
>>
>>             Below is code of a simple app i am testing
>>
>>              def main(args: Array[String]) {
>>                 println("Welcome to example application!")
>>
>>                 val sc = new
>>             SparkContext("spark://10.128.228.142:7077
>>             <http://10.128.228.142:7077>", "Simple App")
>>
>>                 println("Spark context created!")
>>
>>                 println("Creating RDD!")
>>
>>             Now once this context is created i want to access  this
>>             to submit multiple jobs/operations
>>
>>             Any help is much appreciated
>>
>>             Thanks
>>
>>
>>
>>
>>
>>
>>
>>     -- 
>>     Warm Regards
>>     Abhinav Chowdary
>
>

Re: Sharing SparkContext

Posted by Mayur Rustagi <ma...@gmail.com>.

fair scheduler merely reorders tasks .. I think he is looking to run
multiple pieces of code on a single context on demand from customers...if
the code & order is decided then fair scheduler will ensure that all tasks
get equal cluster time :)



Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski
<og...@nengoiksvelzud.com>wrote:

>  Doesn't the fair scheduler solve this?
> Ognen
>
>
> On 2/25/14, 12:08 PM, abhinav chowdary wrote:
>
> Sorry for not being clear earlier
> how do you want to pass the operations to the spark context?
> this is partly what i am looking for . How to access the active spark
> context and possible ways to pass operations
>
>  Thanks
>
>
>
>  On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <ma...@gmail.com>wrote:
>
>> how do you want to pass the operations to the spark context?
>>
>>
>>  Mayur Rustagi
>> Ph: +919632149971
>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>  https://twitter.com/mayur_rustagi
>>
>>
>>
>> On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <
>> abhinav.chowdary@gmail.com> wrote:
>>
>>> Hi,
>>>        I am looking for ways to share the sparkContext, meaning i need
>>> to be able to perform multiple operations on the same spark context.
>>>
>>>  Below is code of a simple app i am testing
>>>
>>>   def main(args: Array[String]) {
>>>     println("Welcome to example application!")
>>>
>>>      val sc = new SparkContext("spark://10.128.228.142:7077", "Simple
>>> App")
>>>
>>>      println("Spark context created!")
>>>
>>>      println("Creating RDD!")
>>>
>>>  Now once this context is created i want to access  this to submit
>>> multiple jobs/operations
>>>
>>>  Any help is much appreciated
>>>
>>>  Thanks
>>>
>>>
>>>
>>>
>>
>
>
>  --
> Warm Regards
> Abhinav Chowdary
>
>
>

Re: Sharing SparkContext

Posted by Ognen Duzlevski <og...@nengoiksvelzud.com>.

Doesn't the fair scheduler solve this?
Ognen

On 2/25/14, 12:08 PM, abhinav chowdary wrote:
> Sorry for not being clear earlier
> how do you want to pass the operations to the spark context?
> this is partly what i am looking for . How to access the active spark 
> context and possible ways to pass operations
>
> Thanks
>
>
>
> On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi 
> <mayur.rustagi@gmail.com <ma...@gmail.com>> wrote:
>
>     how do you want to pass the operations to the spark context?
>
>
>     Mayur Rustagi
>     Ph: +919632149971 <tel:%2B919632149971>
>     h
>     <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>     <http://www.sigmoidanalytics.com>
>     https://twitter.com/mayur_rustagi
>
>
>
>     On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary
>     <abhinav.chowdary@gmail.com <ma...@gmail.com>>
>     wrote:
>
>         Hi,
>                I am looking for ways to share the sparkContext,
>         meaning i need to be able to perform multiple operations on
>         the same spark context.
>
>         Below is code of a simple app i am testing
>
>          def main(args: Array[String]) {
>             println("Welcome to example application!")
>
>             val sc = new SparkContext("spark://10.128.228.142:7077
>         <http://10.128.228.142:7077>", "Simple App")
>
>             println("Spark context created!")
>
>             println("Creating RDD!")
>
>         Now once this context is created i want to access  this to
>         submit multiple jobs/operations
>
>         Any help is much appreciated
>
>         Thanks
>
>
>
>
>
>
>
> -- 
> Warm Regards
> Abhinav Chowdary

Re: Sharing SparkContext

Posted by Ognen Duzlevski <og...@plainvanillagames.com>.

On 2/25/14, 12:24 PM, Mayur Rustagi wrote:
> So there is no way to share context currently,
> 1. you can try jobserver by Ooyala but I havnt used it & frankly 
> nobody has shared feedback on it.

One of the major show stoppers for me is that when compiled with Hadoop 
2.2.0 - Ooyala standalone server from the jobserver branch does not 
work. If you are OK staying with 1.0.4, it does work.

Ognen

Re: Sharing SparkContext

Posted by Mayur Rustagi <ma...@gmail.com>.

So there is no way to share context currently,
1. you can try jobserver by Ooyala but I havnt used it & frankly nobody has
shared feedback on it.
2. If you can load that rdd to Shark then you get a sql interface on that
RDD + columnar storage
3. You can try a crude method of starting a spark shell & passing commands
to it after receiving them through html interface etc.. but you'll have to
do the hard work of managing concurrency.
I was wondering about the usecase, are you looking to pass the spark
closure on rdd & transforming it each time or looking to avoid caching RDD
again & again.





Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Tue, Feb 25, 2014 at 10:08 AM, abhinav chowdary <
abhinav.chowdary@gmail.com> wrote:

> Sorry for not being clear earlier
>
> how do you want to pass the operations to the spark context?
> this is partly what i am looking for . How to access the active spark
> context and possible ways to pass operations
>
> Thanks
>
>
>
> On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <ma...@gmail.com>wrote:
>
>> how do you want to pass the operations to the spark context?
>>
>>
>> Mayur Rustagi
>> Ph: +919632149971
>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>> https://twitter.com/mayur_rustagi
>>
>>
>>
>> On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <
>> abhinav.chowdary@gmail.com> wrote:
>>
>>> Hi,
>>>        I am looking for ways to share the sparkContext, meaning i need
>>> to be able to perform multiple operations on the same spark context.
>>>
>>> Below is code of a simple app i am testing
>>>
>>>  def main(args: Array[String]) {
>>>     println("Welcome to example application!")
>>>
>>>     val sc = new SparkContext("spark://10.128.228.142:7077", "Simple
>>> App")
>>>
>>>     println("Spark context created!")
>>>
>>>     println("Creating RDD!")
>>>
>>> Now once this context is created i want to access  this to submit
>>> multiple jobs/operations
>>>
>>> Any help is much appreciated
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>
>
>
> --
> Warm Regards
> Abhinav Chowdary
>

Re: Sharing SparkContext

Posted by abhinav chowdary <ab...@gmail.com>.

Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark
context and possible ways to pass operations

Thanks



On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi <ma...@gmail.com>wrote:

> how do you want to pass the operations to the spark context?
>
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <
> abhinav.chowdary@gmail.com> wrote:
>
>> Hi,
>>        I am looking for ways to share the sparkContext, meaning i need to
>> be able to perform multiple operations on the same spark context.
>>
>> Below is code of a simple app i am testing
>>
>>  def main(args: Array[String]) {
>>     println("Welcome to example application!")
>>
>>     val sc = new SparkContext("spark://10.128.228.142:7077", "Simple
>> App")
>>
>>     println("Spark context created!")
>>
>>     println("Creating RDD!")
>>
>> Now once this context is created i want to access  this to submit
>> multiple jobs/operations
>>
>> Any help is much appreciated
>>
>> Thanks
>>
>>
>>
>>
>


-- 
Warm Regards
Abhinav Chowdary

Re: Sharing SparkContext

Posted by Mayur Rustagi <ma...@gmail.com>.

how do you want to pass the operations to the spark context?


Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary <
abhinav.chowdary@gmail.com> wrote:

> Hi,
>        I am looking for ways to share the sparkContext, meaning i need to
> be able to perform multiple operations on the same spark context.
>
> Below is code of a simple app i am testing
>
>  def main(args: Array[String]) {
>     println("Welcome to example application!")
>
>     val sc = new SparkContext("spark://10.128.228.142:7077", "Simple App")
>
>     println("Spark context created!")
>
>     println("Creating RDD!")
>
> Now once this context is created i want to access  this to submit multiple
> jobs/operations
>
> Any help is much appreciated
>
> Thanks
>
>
>
>