You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2014/09/24 09:38:33 UTC

[jira] [Commented] (SPARK-3662) Importing pandas breaks included pi.py example

    [ https://issues.apache.org/jira/browse/SPARK-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146017#comment-14146017 ] 

Sean Owen commented on SPARK-3662:
----------------------------------

Maybe I miss something, but, does this just mean you can't "import pandas" entirely? If you're modifying the example, you should import only what you need from pandas. Or, it may be that you need to modify the "import random", indeed, to accommodate other modifications you want to make.

But what is the problem with the included example? it runs fine without modifications, no?

> Importing pandas breaks included pi.py example
> ----------------------------------------------
>
>                 Key: SPARK-3662
>                 URL: https://issues.apache.org/jira/browse/SPARK-3662
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, YARN
>    Affects Versions: 1.1.0
>         Environment: Xubuntu 14.04.  Yarn cluster running on Ubuntu 12.04.
>            Reporter: Evan Samanas
>
> If I add "import pandas" at the top of the included pi.py example and submit using "spark-submit --master yarn-client", I get this stack trace:
> {code}
> Traceback (most recent call last):
>   File "/home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi.py", line 39, in <module>
>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>   File "/home/evan/pub_src/spark/python/pyspark/rdd.py", line 759, in reduce
>     vals = self.mapPartitions(func).collect()
>   File "/home/evan/pub_src/spark/python/pyspark/rdd.py", line 723, in collect
>     bytesInJava = self._jrdd.collect().iterator()
>   File "/home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
>   File "/home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
> py4j.protocol.Py4JJavaError14/09/23 15:51:58 INFO TaskSetManager: Lost task 2.3 in stage 0.0 (TID 10) on executor SERVERNAMEREMOVED: org.apache.spark.api.python.PythonException (Traceback (most recent call last):
>   File "/yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/worker.py", line 75, in main
>     command = pickleSer._read_with_length(infile)
>   File "/yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/serializers.py", line 150, in _read_with_length
>     return self.loads(obj)
> ImportError: No module named algos
> {code}
> The example works fine if I move the statement "from random import random" from the top and into the function (def f(_)) defined in the example.  Near as I can tell, "random" is getting confused with a function of the same name within pandas.algos.  
> Submitting the same script using --master local works, but gives a distressing amount of random characters to stdout or stderr and messes up my terminal:
> {code}
> ...
> @J@J@J@J@J@J@J@J@J@J@J@J@J@JJ@J@J@J@J @J!@J"@J#@J$@J%@J&@J'@J(@J)@J*@J+@J,@J-@J.@J/@J0@J1@J2@J3@J4@J5@J6@J7@J8@J9@J:@J;@J<@J...@J>@J?@J@@JA@JB@JC@JD@JE@JF@JG@JH@JI@JJ@JK@JL@JM@JN@JO@JP@JQ@JR@JS@JT@JU@JV@JW@JX@JY@JZ@J[@J\@J]@J^@J_@J`@Ja@Jb@Jc@Jd@Je@Jf@Jg@Jh@Ji@Jj@Jk@Jl@Jm@Jn@Jo@Jp@Jq@Jr@Js@Jt@Ju@Jv@Jw@Jx@Jy@Jz@J{@J|@J}@J~@J@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JAJAJAJAJAJAJAJAAJ       AJ
> AJ
>   AJ
> AJAJAJAJAJAJAJAJAJAJAJAJAJAJJAJAJAJAJ AJ!AJ"AJ#AJ$AJ%AJ&AJ'AJ(AJ)AJ*AJ+AJ,AJ-AJ.AJ/AJ0AJ1AJ2AJ3AJ4AJ5AJ6AJ7AJ8AJ9AJ:AJ;AJ<AJ=AJ>AJ?AJ@AJAAJBAJCAJDAJEAJFAJGAJHAJIAJJAJKAJLAJMAJNAJOAJPAJQAJRAJSAJTAJUAJVAJWAJXAJYAJZAJ[AJ\AJ]AJ^AJ_AJ`AJaAJbAJcAJdAJeAJfAJgAJhAJiAJjAJkAJlAJmAJnAJoAJpAJqAJrAJsAJtAJuAJvAJwAJxAJyAJzAJ{AJ|AJ}AJ~AJAJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�A14/09/23 15:42:09 INFO SparkContext: Job finished: reduce at /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi_sframe.py:38, took 11.276879779 s
> J�AJ�AJ�AJ�AJ�AJ�AJ�AJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJBJBJBJBJBJBJBJBBJ     BJ
> BJ
>   BJ
> BJBJBJBJBJBJBJBJBJBJBJBJBJBJJBJBJBJBJ BJ!BJ"BJ#BJ$BJ%BJ&BJ'BJ(BJ)BJ*BJ+BJ,BJ-BJ.BJ/BJ0BJ1BJ2BJ3BJ4BJ5BJ6BJ7BJ8BJ9BJ:BJ;BJ<BJ=BJ>BJ?BJ@Be.
> �]qJ#1a.
> �]qJX4a.
> �]qJX4a.
> �]qJ#1a.
> �]qJX4a.
> �]qJX4a.
> �]qJ#1a.
> �]qJX4a.
> �]qJX4a.
> �]qJa.
> Pi is roughly 3.146136
> {code}
> No idea if that's related, but thought I'd include it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org