You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Jason White <ja...@shopify.com> on 2016/02/19 03:07:41 UTC

How to run PySpark tests?

Hi,

I'm trying to finish up a PR (https://github.com/apache/spark/pull/10089)
which is currently failing PySpark tests. The instructions to run the test
suite seem a little dated. I was able to find these:
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
http://spark.apache.org/docs/latest/building-spark.html

I've tried running `python/run-tests`, but it fails hard at the ORC tests. I
suspect it has to do with the external libraries not being compiled or put
in the right location.
I've tried running `SPARK_TESTING=1 ./bin/pyspark
python/pyspark/streaming/tests.py` as suggested, but this doesn't work on
Spark 2.0.
I've tried running `SPARK_TESTING=1 ./bin/spark-submit
python/pyspark/streaming/tests.py`and that worked a little better, but it
failed at `pyspark.streaming.tests.KafkaStreamTests`, with
`java.lang.ClassNotFoundException:
org.apache.spark.streaming.kafka.KafkaTestUtils`. I suspect the same issue
with external libraries.

I've compiling Spark with `build/mvn -Pyarn -Phadoop-2.4
-Dhadoop.version=2.4.0 -DskipTests clean package` with no trouble.

Is there any better documentation somewhere about how to run the PySpark
tests?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-run-PySpark-tests-tp16357.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: How to run PySpark tests?

Posted by Holden Karau <ho...@pigscanfly.ca>.
Or wait I don't have access to the wiki - if anyone can give me wiki access
I'll update the instructions.

On Thu, Feb 18, 2016 at 8:45 PM, Holden Karau <ho...@pigscanfly.ca> wrote:

> Great - I'll update the wiki.
>
> On Thu, Feb 18, 2016 at 8:34 PM, Jason White <ja...@shopify.com>
> wrote:
>
>> Compiling with `build/mvn -Pyarn -Phadoop-2.4 -Phive
>> -Dhadoop.version=2.4.0
>> -DskipTests clean package` followed by `python/run-tests` seemed to do the
>> trick! Thanks!
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-run-PySpark-tests-tp16357p16362.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: How to run PySpark tests?

Posted by Holden Karau <ho...@pigscanfly.ca>.
Great - I'll update the wiki.

On Thu, Feb 18, 2016 at 8:34 PM, Jason White <ja...@shopify.com>
wrote:

> Compiling with `build/mvn -Pyarn -Phadoop-2.4 -Phive -Dhadoop.version=2.4.0
> -DskipTests clean package` followed by `python/run-tests` seemed to do the
> trick! Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-run-PySpark-tests-tp16357p16362.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: How to run PySpark tests?

Posted by Jason White <ja...@shopify.com>.
Compiling with `build/mvn -Pyarn -Phadoop-2.4 -Phive -Dhadoop.version=2.4.0
-DskipTests clean package` followed by `python/run-tests` seemed to do the
trick! Thanks!



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-run-PySpark-tests-tp16357p16362.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: How to run PySpark tests?

Posted by Holden Karau <ho...@pigscanfly.ca>.
I've run into some problems with the Python tests in the past when I
haven't built with hive support, you might want to build your assembly with
hive support and see if that helps.

On Thursday, February 18, 2016, Jason White <ja...@shopify.com> wrote:

> Hi,
>
> I'm trying to finish up a PR (https://github.com/apache/spark/pull/10089)
> which is currently failing PySpark tests. The instructions to run the test
> suite seem a little dated. I was able to find these:
> https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
> http://spark.apache.org/docs/latest/building-spark.html
>
> I've tried running `python/run-tests`, but it fails hard at the ORC tests.
> I
> suspect it has to do with the external libraries not being compiled or put
> in the right location.
> I've tried running `SPARK_TESTING=1 ./bin/pyspark
> python/pyspark/streaming/tests.py` as suggested, but this doesn't work on
> Spark 2.0.
> I've tried running `SPARK_TESTING=1 ./bin/spark-submit
> python/pyspark/streaming/tests.py`and that worked a little better, but it
> failed at `pyspark.streaming.tests.KafkaStreamTests`, with
> `java.lang.ClassNotFoundException:
> org.apache.spark.streaming.kafka.KafkaTestUtils`. I suspect the same issue
> with external libraries.
>
> I've compiling Spark with `build/mvn -Pyarn -Phadoop-2.4
> -Dhadoop.version=2.4.0 -DskipTests clean package` with no trouble.
>
> Is there any better documentation somewhere about how to run the PySpark
> tests?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-run-PySpark-tests-tp16357.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <javascript:;>
> For additional commands, e-mail: dev-help@spark.apache.org <javascript:;>
>
>

-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau