You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "ABHISHEK CHOUDHARY (JIRA)" <ji...@apache.org> on 2015/08/24 21:18:46 UTC

[jira] [Created] (SPARK-10189) python rdd socket connection problem

ABHISHEK CHOUDHARY created SPARK-10189:
------------------------------------------

             Summary: python rdd socket connection problem
                 Key: SPARK-10189
                 URL: https://issues.apache.org/jira/browse/SPARK-10189
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.4.1
            Reporter: ABHISHEK CHOUDHARY


I am trying to use wholeTextFiles with pyspark , and now I am getting the same error -

```
textFiles = sc.wholeTextFiles('/file/content')
textFiles.take(1)
```

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Volumes/work/bigdata/CHD5.4/spark-1.4.1-bin-hadoop2.6/python/pyspark/rdd.py", line 1277, in take
    res = self.context.runJob(self, takeUpToNumLeft, p, True)
  File "/Volumes/work/bigdata/CHD5.4/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 898, in runJob
    return list(_load_from_socket(port, mappedRDD._jrdd_deserializer))
  File "/Volumes/work/bigdata/CHD5.4/spark-1.4.1-bin-hadoop2.6/python/pyspark/rdd.py", line 138, in _load_from_socket
    raise Exception("could not open socket")
Exception: could not open socket
>>> 15/08/24 20:09:27 ERROR PythonRDD: Error while sending iterator
java.net.SocketTimeoutException: Accept timed out
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:404)
    at java.net.ServerSocket.implAccept(ServerSocket.java:545)
    at java.net.ServerSocket.accept(ServerSocket.java:513)
    at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:623)
```

Current piece of code in rdd.py-

```
def _load_from_socket(port, serializer):
    sock = None
    # Support for both IPv4 and IPv6.
    # On most of IPv6-ready systems, IPv6 will take precedence.
    for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        try:
            sock = socket.socket(af, socktype, proto)
            sock.settimeout(3)
            sock.connect(sa)
        except socket.error:
            sock = None
            continue
        break
    if not sock:
        raise Exception("could not open socket")
    try:
        rf = sock.makefile("rb", 65536)
        for item in serializer.load_stream(rf):
            yield item
    finally:
        sock.close()
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org