You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Han-Cheol Cho <ha...@nhn-techorus.com> on 2017/03/02 08:30:38 UTC
strange usage of tempfile.mkdtemp() in PySpark mllib.recommendation doctest
Dear Spark user mailinglist members,
In PySpark's mllib.recommendation doctest, I found a bit strange usage of
temporary directory creation function, tempfile.mkdtemp(), in the following
part.
# https://github.com/apache/spark/blob/master/python/pyspark/mllib/recommendation.py
...
>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> model.save(sc, path)
>>> sameModel = MatrixFactorizationModel.load(sc, path)
>>> sameModel.predict(2, 2)
0.4...
>>> sameModel.predictAll(testset).collect()
[Rating(...
>>> from shutil import rmtree
>>> try:
... rmtree(path)
... except OSError:
... pass
As I understand, calling tempfile.mkdtemp() function creates a temporary
directory in LOCAL machine.
However, model.save(sc, path) saves the model data in HDFS.
After all, the doctest removes only LOCAL temp directory using shutil.rmtree().
Shouldn't we delete the temporary directory in HDFS too?
Best wishes,
HanCheol
Han-Cheol Cho Data Laboratory / Data Scientist <!-- <span id="deptLineBR"><br></span> --> 〒160-0022 東京都新宿区新宿6-27-30 新宿イーストサイドスクエア13階
Email hancheol.cho@nhn-techorus.com