You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2014/08/14 11:14:11 UTC
[jira] [Commented] (SPARK-3035) Wrong example with
SparkContext.addFile
[ https://issues.apache.org/jira/browse/SPARK-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096773#comment-14096773 ]
Apache Spark commented on SPARK-3035:
-------------------------------------
User 'iAmGhost' has created a pull request for this issue:
https://github.com/apache/spark/pull/1942
> Wrong example with SparkContext.addFile
> ---------------------------------------
>
> Key: SPARK-3035
> URL: https://issues.apache.org/jira/browse/SPARK-3035
> Project: Spark
> Issue Type: Documentation
> Components: PySpark
> Affects Versions: 1.0.2
> Reporter: Daehan Kim
> Priority: Trivial
> Labels: documentation
> Fix For: 1.0.2
>
>
> {code:title="context.py"}
> def addFile(self, path):
> """
> ...
> >>> from pyspark import SparkFiles
> >>> path = os.path.join(tempdir, "test.txt")
> >>> with open(path, "w") as testFile:
> ... testFile.write("100")
> >>> sc.addFile(path)
> >>> def func(iterator):
> ... with open(SparkFiles.get("test.txt")) as testFile:
> ... fileVal = int(testFile.readline())
> ... return [x * 100 for x in iterator]
> >>> sc.parallelize([1, 2, 3, 4]).mapPartitions(func).collect()
> [100, 200, 300, 400]
> """
> {code}
> This is example that write 100 to temp file and distribute it and use it's value when multiplying values(to see if nodes can read distributed file)
> But look this lines, result will never be effected by distributed file:
> {code}
> ... fileVal = int(testFile.readline())
> ... return [x * 100 for x in iterator]
> {code}
> I'm sure this code was intended as like this:
> {code}
> ... fileVal = int(testFile.readline())
> ... return [x * fileVal for x in iterator]
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org