You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eric O. LEBIGOT (EOL) (JIRA)" <ji...@apache.org> on 2017/03/12 11:32:04 UTC
[jira] [Commented] (SPARK-18281) toLocalIterator yields time out
error on pyspark2
[ https://issues.apache.org/jira/browse/SPARK-18281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906497#comment-15906497 ]
Eric O. LEBIGOT (EOL) commented on SPARK-18281:
-----------------------------------------------
Same here: I see the problem with the latest version too (2.1.0)! The problem appears randomly, for me (I haven't built any minimal example, but the problem looks very much like the one reported here: toLocalIterator used, etc.).
> toLocalIterator yields time out error on pyspark2
> -------------------------------------------------
>
> Key: SPARK-18281
> URL: https://issues.apache.org/jira/browse/SPARK-18281
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.0.1
> Environment: Ubuntu 14.04.5 LTS
> Driver: AWS M4.XLARGE
> Slaves: AWS M4.4.XLARGE
> mesos 1.0.1
> spark 2.0.1
> pyspark
> Reporter: Luke Miner
> Assignee: Liang-Chi Hsieh
> Fix For: 2.0.3, 2.1.1
>
>
> I run the example straight out of the api docs for toLocalIterator and it gives a time out exception:
> {code}
> from pyspark import SparkContext
> sc = SparkContext()
> rdd = sc.parallelize(range(10))
> [x for x in rdd.toLocalIterator()]
> {code}
> conf file:
> spark.driver.maxResultSize 6G
> spark.executor.extraJavaOptions -XX:+UseG1GC -XX:MaxPermSize=1G -XX:+HeapDumpOnOutOfMemoryError
> spark.executor.memory 16G
> spark.executor.uri foo/spark-2.0.1-bin-hadoop2.7.tgz
> spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
> spark.hadoop.fs.s3a.buffer.dir /raid0/spark
> spark.hadoop.fs.s3n.buffer.dir /raid0/spark
> spark.hadoop.fs.s3a.connection.timeout 500000
> spark.hadoop.fs.s3n.multipart.uploads.enabled true
> spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 2
> spark.hadoop.parquet.block.size 2147483648
> spark.hadoop.parquet.enable.summary-metadata false
> spark.jars.packages com.databricks:spark-avro_2.11:3.0.1,com.amazonaws:aws-java-sdk-pom:1.10.34
> spark.local.dir /raid0/spark
> spark.mesos.coarse false
> spark.mesos.constraints priority:1
> spark.network.timeout 600
> spark.rpc.message.maxSize 500
> spark.speculation false
> spark.sql.parquet.mergeSchema false
> spark.sql.planner.externalSort true
> spark.submit.deployMode client
> spark.task.cpus 1
> Exception here:
> {code}
> ---------------------------------------------------------------------------
> timeout Traceback (most recent call last)
> <ipython-input-1-6319dd276401> in <module>()
> 2 sc = SparkContext()
> 3 rdd = sc.parallelize(range(10))
> ----> 4 [x for x in rdd.toLocalIterator()]
> /foo/spark-2.0.1-bin-hadoop2.7/python/pyspark/rdd.pyc in _load_from_socket(port, serializer)
> 140 try:
> 141 rf = sock.makefile("rb", 65536)
> --> 142 for item in serializer.load_stream(rf):
> 143 yield item
> 144 finally:
> /foo/spark-2.0.1-bin-hadoop2.7/python/pyspark/serializers.pyc in load_stream(self, stream)
> 137 while True:
> 138 try:
> --> 139 yield self._read_with_length(stream)
> 140 except EOFError:
> 141 return
> /foo/spark-2.0.1-bin-hadoop2.7/python/pyspark/serializers.pyc in _read_with_length(self, stream)
> 154
> 155 def _read_with_length(self, stream):
> --> 156 length = read_int(stream)
> 157 if length == SpecialLengths.END_OF_DATA_SECTION:
> 158 raise EOFError
> /foo/spark-2.0.1-bin-hadoop2.7/python/pyspark/serializers.pyc in read_int(stream)
> 541
> 542 def read_int(stream):
> --> 543 length = stream.read(4)
> 544 if not length:
> 545 raise EOFError
> /usr/lib/python2.7/socket.pyc in read(self, size)
> 378 # fragmentation issues on many platforms.
> 379 try:
> --> 380 data = self._sock.recv(left)
> 381 except error, e:
> 382 if e.args[0] == EINTR:
> timeout: timed out
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org