You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike Trienis <mi...@orcsol.com> on 2015/08/25 20:10:51 UTC

How to unit test HiveContext without OutOfMemoryError (using sbt)

Hello,

I am using sbt and created a unit test where I create a `HiveContext` and
execute some query and then return. Each time I run the unit test the JVM
will increase it's memory usage until I get the error:

Internal error when running tests: java.lang.OutOfMemoryError: PermGen space
Exception in thread "Thread-2" java.io.EOFException

As a work-around, I can fork a new JVM each time I run the unit test,
however, it seems like a bad solution as takes a while to run the unit
test.

By the way, I tried to importing the TestHiveContext:

   - import org.apache.spark.sql.hive.test.TestHiveContext

However, it suffers from the same memory issue. Has anyone else suffered
from the same problem? Note that I am running these unit tests on my mac.

Cheers, Mike.

Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

Posted by Michael Armbrust <mi...@databricks.com>.
I'd suggest setting sbt to fork when running tests.

On Wed, Aug 26, 2015 at 10:51 AM, Mike Trienis <mi...@orcsol.com>
wrote:

> Thanks for your response Yana,
>
> I can increase the MaxPermSize parameter and it will allow me to run the
> unit test a few more times before I run out of memory.
>
> However, the primary issue is that running the same unit test in the same
> JVM (multiple times) results in increased memory (each run of the unit
> test) and I believe it has something to do with HiveContext not reclaiming
> memory after it is finished (or I'm not shutting it down properly).
>
> It could very well be related to sbt, however, it's not clear to me.
>
>
> On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska <ya...@gmail.com>
> wrote:
>
>> The PermGen space error is controlled with MaxPermSize parameter. I run
>> with this in my pom, I think copied pretty literally from Spark's own
>> tests... I don't know what the sbt equivalent is but you should be able to
>> pass it...possibly via SBT_OPTS?
>>
>>
>>  <plugin>
>>               <groupId>org.scalatest</groupId>
>>               <artifactId>scalatest-maven-plugin</artifactId>
>>               <version>1.0</version>
>>               <configuration>
>>
>> <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
>>                   <parallel>false</parallel>
>>                   <junitxml>.</junitxml>
>>                   <filereports>SparkTestSuite.txt</filereports>
>>                   <argLine>-Xmx3g -XX:MaxPermSize=256m
>> -XX:ReservedCodeCacheSize=512m</argLine>
>>                   <stderr/>
>>                   <systemProperties>
>>                       <java.awt.headless>true</java.awt.headless>
>>                       <spark.testing>1</spark.testing>
>>                       <spark.ui.enabled>false</spark.ui.enabled>
>>
>> <spark.driver.allowMultipleContexts>true</spark.driver.allowMultipleContexts>
>>                   </systemProperties>
>>               </configuration>
>>               <executions>
>>                   <execution>
>>                       <id>test</id>
>>                       <goals>
>>                           <goal>test</goal>
>>                       </goals>
>>                   </execution>
>>               </executions>
>>           </plugin>
>>       </plugins>
>>
>>
>> On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis <mi...@orcsol.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am using sbt and created a unit test where I create a `HiveContext`
>>> and execute some query and then return. Each time I run the unit test the
>>> JVM will increase it's memory usage until I get the error:
>>>
>>> Internal error when running tests: java.lang.OutOfMemoryError: PermGen
>>> space
>>> Exception in thread "Thread-2" java.io.EOFException
>>>
>>> As a work-around, I can fork a new JVM each time I run the unit test,
>>> however, it seems like a bad solution as takes a while to run the unit
>>> test.
>>>
>>> By the way, I tried to importing the TestHiveContext:
>>>
>>>    - import org.apache.spark.sql.hive.test.TestHiveContext
>>>
>>> However, it suffers from the same memory issue. Has anyone else suffered
>>> from the same problem? Note that I am running these unit tests on my mac.
>>>
>>> Cheers, Mike.
>>>
>>>
>>
>

Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

Posted by Mike Trienis <mi...@orcsol.com>.
Thanks for your response Yana,

I can increase the MaxPermSize parameter and it will allow me to run the
unit test a few more times before I run out of memory.

However, the primary issue is that running the same unit test in the same
JVM (multiple times) results in increased memory (each run of the unit
test) and I believe it has something to do with HiveContext not reclaiming
memory after it is finished (or I'm not shutting it down properly).

It could very well be related to sbt, however, it's not clear to me.


On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska <ya...@gmail.com>
wrote:

> The PermGen space error is controlled with MaxPermSize parameter. I run
> with this in my pom, I think copied pretty literally from Spark's own
> tests... I don't know what the sbt equivalent is but you should be able to
> pass it...possibly via SBT_OPTS?
>
>
>  <plugin>
>               <groupId>org.scalatest</groupId>
>               <artifactId>scalatest-maven-plugin</artifactId>
>               <version>1.0</version>
>               <configuration>
>
> <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
>                   <parallel>false</parallel>
>                   <junitxml>.</junitxml>
>                   <filereports>SparkTestSuite.txt</filereports>
>                   <argLine>-Xmx3g -XX:MaxPermSize=256m
> -XX:ReservedCodeCacheSize=512m</argLine>
>                   <stderr/>
>                   <systemProperties>
>                       <java.awt.headless>true</java.awt.headless>
>                       <spark.testing>1</spark.testing>
>                       <spark.ui.enabled>false</spark.ui.enabled>
>
> <spark.driver.allowMultipleContexts>true</spark.driver.allowMultipleContexts>
>                   </systemProperties>
>               </configuration>
>               <executions>
>                   <execution>
>                       <id>test</id>
>                       <goals>
>                           <goal>test</goal>
>                       </goals>
>                   </execution>
>               </executions>
>           </plugin>
>       </plugins>
>
>
> On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis <mi...@orcsol.com>
> wrote:
>
>> Hello,
>>
>> I am using sbt and created a unit test where I create a `HiveContext` and
>> execute some query and then return. Each time I run the unit test the JVM
>> will increase it's memory usage until I get the error:
>>
>> Internal error when running tests: java.lang.OutOfMemoryError: PermGen
>> space
>> Exception in thread "Thread-2" java.io.EOFException
>>
>> As a work-around, I can fork a new JVM each time I run the unit test,
>> however, it seems like a bad solution as takes a while to run the unit
>> test.
>>
>> By the way, I tried to importing the TestHiveContext:
>>
>>    - import org.apache.spark.sql.hive.test.TestHiveContext
>>
>> However, it suffers from the same memory issue. Has anyone else suffered
>> from the same problem? Note that I am running these unit tests on my mac.
>>
>> Cheers, Mike.
>>
>>
>

Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

Posted by Yana Kadiyska <ya...@gmail.com>.
The PermGen space error is controlled with MaxPermSize parameter. I run
with this in my pom, I think copied pretty literally from Spark's own
tests... I don't know what the sbt equivalent is but you should be able to
pass it...possibly via SBT_OPTS?


 <plugin>
              <groupId>org.scalatest</groupId>
              <artifactId>scalatest-maven-plugin</artifactId>
              <version>1.0</version>
              <configuration>

<reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
                  <parallel>false</parallel>
                  <junitxml>.</junitxml>
                  <filereports>SparkTestSuite.txt</filereports>
                  <argLine>-Xmx3g -XX:MaxPermSize=256m
-XX:ReservedCodeCacheSize=512m</argLine>
                  <stderr/>
                  <systemProperties>
                      <java.awt.headless>true</java.awt.headless>
                      <spark.testing>1</spark.testing>
                      <spark.ui.enabled>false</spark.ui.enabled>

<spark.driver.allowMultipleContexts>true</spark.driver.allowMultipleContexts>
                  </systemProperties>
              </configuration>
              <executions>
                  <execution>
                      <id>test</id>
                      <goals>
                          <goal>test</goal>
                      </goals>
                  </execution>
              </executions>
          </plugin>
      </plugins>


On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis <mi...@orcsol.com>
wrote:

> Hello,
>
> I am using sbt and created a unit test where I create a `HiveContext` and
> execute some query and then return. Each time I run the unit test the JVM
> will increase it's memory usage until I get the error:
>
> Internal error when running tests: java.lang.OutOfMemoryError: PermGen
> space
> Exception in thread "Thread-2" java.io.EOFException
>
> As a work-around, I can fork a new JVM each time I run the unit test,
> however, it seems like a bad solution as takes a while to run the unit
> test.
>
> By the way, I tried to importing the TestHiveContext:
>
>    - import org.apache.spark.sql.hive.test.TestHiveContext
>
> However, it suffers from the same memory issue. Has anyone else suffered
> from the same problem? Note that I am running these unit tests on my mac.
>
> Cheers, Mike.
>
>