You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Steve Annessa <st...@gmail.com> on 2016/02/05 02:36:56 UTC

Unit test with sqlContext

I'm trying to unit test a function that reads in a JSON file, manipulates
the DF and then returns a Scala Map.

The function has signature:
def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)

I've created a bootstrap spec for spark jobs that instantiates the Spark
Context and SQLContext like so:

@transient var sc: SparkContext = _
@transient var sqlContext: SQLContext = _

override def beforeAll = {
  System.clearProperty("spark.driver.port")
  System.clearProperty("spark.hostPort")

  val conf = new SparkConf()
    .setMaster(master)
    .setAppName(appName)

  sc = new SparkContext(conf)
  sqlContext = new SQLContext(sc)
}

When I do not include sqlContext, my tests run. Once I add the sqlContext I
get the following errors:

16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
constructed (or threw an exception in its constructor).  This may indicate
an error, since only one SparkContext may be running in this JVM (see
SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:81)

16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not
unique!

and finally:

[info] IngestSpec:
[info] Exception encountered when attempting to run a suite with class
name: com.company.package.IngestSpec *** ABORTED ***
[info]   akka.actor.InvalidActorNameException: actor name
[ExecutorEndpoint] is not unique!


What do I need to do to get a sqlContext through my tests?

Thanks,

-- Steve

Re: Unit test with sqlContext

Posted by Vikas Kawadia <ka...@gmail.com>.
If you prefer  the py.test framework, I just wrote a blog post with some
examples:

Unit testing Apache Spark with py.test
https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b

On Fri, Feb 5, 2016 at 11:43 AM, Steve Annessa <st...@gmail.com>
wrote:

> Thanks for all of the responses.
>
> I do have an afterAll that stops the sc.
>
> While looking over Holden's readme I noticed she mentioned "Make sure to
> disable parallel execution." That was what I was missing; I added the
> follow to my build.sbt:
>
> ```
> parallelExecution in Test := false
> ```
>
> Now all of my tests are running.
>
> I'm going to look into using the package she created.
>
> Thanks again,
>
> -- Steve
>
>
> On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra <rm...@snappydata.io>
> wrote:
>
>> Hi Steve,
>> Have you cleaned up your SparkContext ( sc.stop())  , in a afterAll().
>> The error suggests you are creating more than one SparkContext.
>>
>>
>> On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>>
>>> Thanks for recommending spark-testing-base :) Just wanted to add if
>>> anyone has feature requests for Spark testing please get in touch (or add
>>> an issue on the github) :)
>>>
>>>
>>> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito <
>>> silvio.fiorito@granturing.com> wrote:
>>>
>>>> Hi Steve,
>>>>
>>>> Have you looked at the spark-testing-base package by Holden? It’s
>>>> really useful for unit testing Spark apps as it handles all the
>>>> bootstrapping for you.
>>>>
>>>> https://github.com/holdenk/spark-testing-base
>>>>
>>>> DataFrame examples are here:
>>>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>>>
>>>> Thanks,
>>>> Silvio
>>>>
>>>> From: Steve Annessa <st...@gmail.com>
>>>> Date: Thursday, February 4, 2016 at 8:36 PM
>>>> To: "user@spark.apache.org" <us...@spark.apache.org>
>>>> Subject: Unit test with sqlContext
>>>>
>>>> I'm trying to unit test a function that reads in a JSON file,
>>>> manipulates the DF and then returns a Scala Map.
>>>>
>>>> The function has signature:
>>>> def ingest(dataLocation: String, sc: SparkContext, sqlContext:
>>>> SQLContext)
>>>>
>>>> I've created a bootstrap spec for spark jobs that instantiates the
>>>> Spark Context and SQLContext like so:
>>>>
>>>> @transient var sc: SparkContext = _
>>>> @transient var sqlContext: SQLContext = _
>>>>
>>>> override def beforeAll = {
>>>>   System.clearProperty("spark.driver.port")
>>>>   System.clearProperty("spark.hostPort")
>>>>
>>>>   val conf = new SparkConf()
>>>>     .setMaster(master)
>>>>     .setAppName(appName)
>>>>
>>>>   sc = new SparkContext(conf)
>>>>   sqlContext = new SQLContext(sc)
>>>> }
>>>>
>>>> When I do not include sqlContext, my tests run. Once I add the
>>>> sqlContext I get the following errors:
>>>>
>>>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
>>>> constructed (or threw an exception in its constructor).  This may indicate
>>>> an error, since only one SparkContext may be running in this JVM (see
>>>> SPARK-2243). The other SparkContext was created at:
>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:81)
>>>>
>>>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
>>>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is
>>>> not unique!
>>>>
>>>> and finally:
>>>>
>>>> [info] IngestSpec:
>>>> [info] Exception encountered when attempting to run a suite with class
>>>> name: com.company.package.IngestSpec *** ABORTED ***
>>>> [info]   akka.actor.InvalidActorNameException: actor name
>>>> [ExecutorEndpoint] is not unique!
>>>>
>>>>
>>>> What do I need to do to get a sqlContext through my tests?
>>>>
>>>> Thanks,
>>>>
>>>> -- Steve
>>>>
>>>
>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Regards,
>> Rishitesh Mishra,
>> SnappyData . (http://www.snappydata.io/)
>>
>> https://in.linkedin.com/in/rishiteshmishra
>>
>
>

Re: Unit test with sqlContext

Posted by Steve Annessa <st...@gmail.com>.
Thanks for all of the responses.

I do have an afterAll that stops the sc.

While looking over Holden's readme I noticed she mentioned "Make sure to
disable parallel execution." That was what I was missing; I added the
follow to my build.sbt:

```
parallelExecution in Test := false
```

Now all of my tests are running.

I'm going to look into using the package she created.

Thanks again,

-- Steve


On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra <rm...@snappydata.io> wrote:

> Hi Steve,
> Have you cleaned up your SparkContext ( sc.stop())  , in a afterAll(). The
> error suggests you are creating more than one SparkContext.
>
>
> On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <ho...@pigscanfly.ca>
> wrote:
>
>> Thanks for recommending spark-testing-base :) Just wanted to add if
>> anyone has feature requests for Spark testing please get in touch (or add
>> an issue on the github) :)
>>
>>
>> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito <
>> silvio.fiorito@granturing.com> wrote:
>>
>>> Hi Steve,
>>>
>>> Have you looked at the spark-testing-base package by Holden? It’s really
>>> useful for unit testing Spark apps as it handles all the bootstrapping for
>>> you.
>>>
>>> https://github.com/holdenk/spark-testing-base
>>>
>>> DataFrame examples are here:
>>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>>
>>> Thanks,
>>> Silvio
>>>
>>> From: Steve Annessa <st...@gmail.com>
>>> Date: Thursday, February 4, 2016 at 8:36 PM
>>> To: "user@spark.apache.org" <us...@spark.apache.org>
>>> Subject: Unit test with sqlContext
>>>
>>> I'm trying to unit test a function that reads in a JSON file,
>>> manipulates the DF and then returns a Scala Map.
>>>
>>> The function has signature:
>>> def ingest(dataLocation: String, sc: SparkContext, sqlContext:
>>> SQLContext)
>>>
>>> I've created a bootstrap spec for spark jobs that instantiates the Spark
>>> Context and SQLContext like so:
>>>
>>> @transient var sc: SparkContext = _
>>> @transient var sqlContext: SQLContext = _
>>>
>>> override def beforeAll = {
>>>   System.clearProperty("spark.driver.port")
>>>   System.clearProperty("spark.hostPort")
>>>
>>>   val conf = new SparkConf()
>>>     .setMaster(master)
>>>     .setAppName(appName)
>>>
>>>   sc = new SparkContext(conf)
>>>   sqlContext = new SQLContext(sc)
>>> }
>>>
>>> When I do not include sqlContext, my tests run. Once I add the
>>> sqlContext I get the following errors:
>>>
>>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
>>> constructed (or threw an exception in its constructor).  This may indicate
>>> an error, since only one SparkContext may be running in this JVM (see
>>> SPARK-2243). The other SparkContext was created at:
>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:81)
>>>
>>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
>>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is
>>> not unique!
>>>
>>> and finally:
>>>
>>> [info] IngestSpec:
>>> [info] Exception encountered when attempting to run a suite with class
>>> name: com.company.package.IngestSpec *** ABORTED ***
>>> [info]   akka.actor.InvalidActorNameException: actor name
>>> [ExecutorEndpoint] is not unique!
>>>
>>>
>>> What do I need to do to get a sqlContext through my tests?
>>>
>>> Thanks,
>>>
>>> -- Steve
>>>
>>
>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Regards,
> Rishitesh Mishra,
> SnappyData . (http://www.snappydata.io/)
>
> https://in.linkedin.com/in/rishiteshmishra
>

Re: Unit test with sqlContext

Posted by Rishi Mishra <rm...@snappydata.io>.
Hi Steve,
Have you cleaned up your SparkContext ( sc.stop())  , in a afterAll(). The
error suggests you are creating more than one SparkContext.


On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <ho...@pigscanfly.ca> wrote:

> Thanks for recommending spark-testing-base :) Just wanted to add if anyone
> has feature requests for Spark testing please get in touch (or add an issue
> on the github) :)
>
>
> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito <
> silvio.fiorito@granturing.com> wrote:
>
>> Hi Steve,
>>
>> Have you looked at the spark-testing-base package by Holden? It’s really
>> useful for unit testing Spark apps as it handles all the bootstrapping for
>> you.
>>
>> https://github.com/holdenk/spark-testing-base
>>
>> DataFrame examples are here:
>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>
>> Thanks,
>> Silvio
>>
>> From: Steve Annessa <st...@gmail.com>
>> Date: Thursday, February 4, 2016 at 8:36 PM
>> To: "user@spark.apache.org" <us...@spark.apache.org>
>> Subject: Unit test with sqlContext
>>
>> I'm trying to unit test a function that reads in a JSON file, manipulates
>> the DF and then returns a Scala Map.
>>
>> The function has signature:
>> def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)
>>
>> I've created a bootstrap spec for spark jobs that instantiates the Spark
>> Context and SQLContext like so:
>>
>> @transient var sc: SparkContext = _
>> @transient var sqlContext: SQLContext = _
>>
>> override def beforeAll = {
>>   System.clearProperty("spark.driver.port")
>>   System.clearProperty("spark.hostPort")
>>
>>   val conf = new SparkConf()
>>     .setMaster(master)
>>     .setAppName(appName)
>>
>>   sc = new SparkContext(conf)
>>   sqlContext = new SQLContext(sc)
>> }
>>
>> When I do not include sqlContext, my tests run. Once I add the sqlContext
>> I get the following errors:
>>
>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
>> constructed (or threw an exception in its constructor).  This may indicate
>> an error, since only one SparkContext may be running in this JVM (see
>> SPARK-2243). The other SparkContext was created at:
>> org.apache.spark.SparkContext.<init>(SparkContext.scala:81)
>>
>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is
>> not unique!
>>
>> and finally:
>>
>> [info] IngestSpec:
>> [info] Exception encountered when attempting to run a suite with class
>> name: com.company.package.IngestSpec *** ABORTED ***
>> [info]   akka.actor.InvalidActorNameException: actor name
>> [ExecutorEndpoint] is not unique!
>>
>>
>> What do I need to do to get a sqlContext through my tests?
>>
>> Thanks,
>>
>> -- Steve
>>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>



-- 
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra

Re: Unit test with sqlContext

Posted by Holden Karau <ho...@pigscanfly.ca>.
Thanks for recommending spark-testing-base :) Just wanted to add if anyone
has feature requests for Spark testing please get in touch (or add an issue
on the github) :)


On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito <
silvio.fiorito@granturing.com> wrote:

> Hi Steve,
>
> Have you looked at the spark-testing-base package by Holden? It’s really
> useful for unit testing Spark apps as it handles all the bootstrapping for
> you.
>
> https://github.com/holdenk/spark-testing-base
>
> DataFrame examples are here:
> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>
> Thanks,
> Silvio
>
> From: Steve Annessa <st...@gmail.com>
> Date: Thursday, February 4, 2016 at 8:36 PM
> To: "user@spark.apache.org" <us...@spark.apache.org>
> Subject: Unit test with sqlContext
>
> I'm trying to unit test a function that reads in a JSON file, manipulates
> the DF and then returns a Scala Map.
>
> The function has signature:
> def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)
>
> I've created a bootstrap spec for spark jobs that instantiates the Spark
> Context and SQLContext like so:
>
> @transient var sc: SparkContext = _
> @transient var sqlContext: SQLContext = _
>
> override def beforeAll = {
>   System.clearProperty("spark.driver.port")
>   System.clearProperty("spark.hostPort")
>
>   val conf = new SparkConf()
>     .setMaster(master)
>     .setAppName(appName)
>
>   sc = new SparkContext(conf)
>   sqlContext = new SQLContext(sc)
> }
>
> When I do not include sqlContext, my tests run. Once I add the sqlContext
> I get the following errors:
>
> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
> constructed (or threw an exception in its constructor).  This may indicate
> an error, since only one SparkContext may be running in this JVM (see
> SPARK-2243). The other SparkContext was created at:
> org.apache.spark.SparkContext.<init>(SparkContext.scala:81)
>
> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not
> unique!
>
> and finally:
>
> [info] IngestSpec:
> [info] Exception encountered when attempting to run a suite with class
> name: com.company.package.IngestSpec *** ABORTED ***
> [info]   akka.actor.InvalidActorNameException: actor name
> [ExecutorEndpoint] is not unique!
>
>
> What do I need to do to get a sqlContext through my tests?
>
> Thanks,
>
> -- Steve
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: Unit test with sqlContext

Posted by Silvio Fiorito <si...@granturing.com>.
Hi Steve,

Have you looked at the spark-testing-base package by Holden? It’s really useful for unit testing Spark apps as it handles all the bootstrapping for you.

https://github.com/holdenk/spark-testing-base

DataFrame examples are here: https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala

Thanks,
Silvio

From: Steve Annessa <st...@gmail.com>>
Date: Thursday, February 4, 2016 at 8:36 PM
To: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Unit test with sqlContext

I'm trying to unit test a function that reads in a JSON file, manipulates the DF and then returns a Scala Map.

The function has signature:
def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)

I've created a bootstrap spec for spark jobs that instantiates the Spark Context and SQLContext like so:

@transient var sc: SparkContext = _
@transient var sqlContext: SQLContext = _

override def beforeAll = {
  System.clearProperty("spark.driver.port")
  System.clearProperty("spark.hostPort")

  val conf = new SparkConf()
    .setMaster(master)
    .setAppName(appName)

  sc = new SparkContext(conf)
  sqlContext = new SQLContext(sc)
}

When I do not include sqlContext, my tests run. Once I add the sqlContext I get the following errors:

16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor).  This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:81)

16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique!

and finally:

[info] IngestSpec:
[info] Exception encountered when attempting to run a suite with class name: com.company.package.IngestSpec *** ABORTED ***
[info]   akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique!


What do I need to do to get a sqlContext through my tests?

Thanks,

-- Steve