You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chaitanya (JIRA)" <ji...@apache.org> on 2016/06/27 14:39:52 UTC

[jira] [Reopened] (SPARK-16219) Unable to run Python wordcount

     [ https://issues.apache.org/jira/browse/SPARK-16219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chaitanya reopened SPARK-16219:
-------------------------------

The previous issue has been solved. Still estimation of pi works and wordcount doesn't.

> Unable to run Python wordcount
> ------------------------------
>
>                 Key: SPARK-16219
>                 URL: https://issues.apache.org/jira/browse/SPARK-16219
>             Project: Spark
>          Issue Type: Test
>          Components: Examples
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 16.04 LTS
>            Reporter: Chaitanya
>              Labels: beginner, newbie, test
>
> I was trying to run the example in Spark. I started with pi estimation and it worked fine for me. Then I tried wordcount with the following command:-
> ./bin/spark-submit   examples/src/main/python/wordcount.py    /home/chaitanya/Desktop/dataset/books_17.txt
> and I got the following error:-
> 16/06/27 11:48:23 INFO spark.SparkContext: Running Spark version 1.6.1
> 16/06/27 11:48:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 16/06/27 11:48:23 WARN util.Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.88.128 instead (on interface ens33)
> 16/06/27 11:48:23 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
> 16/06/27 11:48:23 INFO spark.SecurityManager: Changing view acls to: chaitanya
> 16/06/27 11:48:23 INFO spark.SecurityManager: Changing modify acls to: chaitanya
> 16/06/27 11:48:23 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(chaitanya); users with modify permissions: Set(chaitanya)
> 16/06/27 11:48:24 INFO util.Utils: Successfully started service 'sparkDriver' on port 34872.
> 16/06/27 11:48:24 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 16/06/27 11:48:24 INFO Remoting: Starting remoting
> 16/06/27 11:48:24 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.88.128:42086]
> 16/06/27 11:48:24 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 42086.
> 16/06/27 11:48:24 INFO spark.SparkEnv: Registering MapOutputTracker
> 16/06/27 11:48:24 INFO spark.SparkEnv: Registering BlockManagerMaster
> 16/06/27 11:48:24 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-85800153-c0d8-43e8-bb4f-4d666a336104
> 16/06/27 11:48:24 INFO storage.MemoryStore: MemoryStore started with capacity 511.5 MB
> 16/06/27 11:48:24 INFO spark.SparkEnv: Registering OutputCommitCoordinator
> 16/06/27 11:48:24 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 16/06/27 11:48:24 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
> 16/06/27 11:48:24 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
> 16/06/27 11:48:24 INFO ui.SparkUI: Started SparkUI at http://192.168.88.128:4040
> 16/06/27 11:48:24 INFO util.Utils: Copying /opt/spark-1.6.1-bin-without-hadoop/examples/src/main/python/wordcount.py to /tmp/spark-8b73cb00-6b0c-4ee9-bdea-2b294ee53e92/userFiles-6d360b48-6988-4fda-9caa-157cac048a6f/wordcount.py
> 16/06/27 11:48:24 INFO spark.SparkContext: Added file file:/opt/spark-1.6.1-bin-without-hadoop/examples/src/main/python/wordcount.py at file:/opt/spark-1.6.1-bin-without-hadoop/examples/src/main/python/wordcount.py with timestamp 1467028104914
> 16/06/27 11:48:24 INFO executor.Executor: Starting executor ID driver on host localhost
> 16/06/27 11:48:25 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42019.
> 16/06/27 11:48:25 INFO netty.NettyBlockTransferService: Server created on 42019
> 16/06/27 11:48:25 INFO storage.BlockManagerMaster: Trying to register BlockManager
> 16/06/27 11:48:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:42019 with 511.5 MB RAM, BlockManagerId(driver, localhost, 42019)
> 16/06/27 11:48:25 INFO storage.BlockManagerMaster: Registered BlockManager
> 16/06/27 11:48:25 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 189.2 KB, free 189.2 KB)
> 16/06/27 11:48:25 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 21.5 KB, free 210.7 KB)
> 16/06/27 11:48:25 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:42019 (size: 21.5 KB, free: 511.5 MB)
> 16/06/27 11:48:25 INFO spark.SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2
> Traceback (most recent call last):
>   File "/opt/spark-1.6.1-bin-without-hadoop/examples/src/main/python/wordcount.py", line 34, in <module>
>     .reduceByKey(add)
>   File "/opt/spark-1.6.1-bin-without-hadoop/python/lib/pyspark.zip/pyspark/rdd.py", line 1558, in reduceByKey
>   File "/opt/spark-1.6.1-bin-without-hadoop/python/lib/pyspark.zip/pyspark/rdd.py", line 1768, in combineByKey
>   File "/opt/spark-1.6.1-bin-without-hadoop/python/lib/pyspark.zip/pyspark/rdd.py", line 2169, in _defaultReducePartitions
>   File "/opt/spark-1.6.1-bin-without-hadoop/python/lib/pyspark.zip/pyspark/rdd.py", line 2363, in getNumPartitions
>   File "/opt/spark-1.6.1-bin-without-hadoop/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
>   File "/opt/spark-1.6.1-bin-without-hadoop/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o18.partitions.
> : java.net.ConnectException: Call From ubuntu/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1480)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy20.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
> 	at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
> 	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1674)
> 	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259)
> 	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
> 	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
> 	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:64)
> 	at org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:46)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
> 	at py4j.Gateway.invoke(Gateway.java:259)
> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
> 	at py4j.GatewayConnection.run(GatewayConnection.java:209)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> 	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1446)
> 	... 45 more
> 16/06/27 11:48:25 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
> 16/06/27 11:48:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
> 16/06/27 11:48:26 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.88.128:4040
> 16/06/27 11:48:26 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
> 16/06/27 11:48:26 INFO storage.MemoryStore: MemoryStore cleared
> 16/06/27 11:48:26 INFO storage.BlockManager: BlockManager stopped
> 16/06/27 11:48:26 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
> 16/06/27 11:48:26 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
> 16/06/27 11:48:26 INFO spark.SparkContext: Successfully stopped SparkContext
> 16/06/27 11:48:26 INFO util.ShutdownHookManager: Shutdown hook called
> 16/06/27 11:48:26 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-8b73cb00-6b0c-4ee9-bdea-2b294ee53e92/pyspark-545595ca-511b-4f7e-a679-886394fbcdad
> 16/06/27 11:48:26 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-8b73cb00-6b0c-4ee9-bdea-2b294ee53e92



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org