You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/20 16:57:40 UTC
[jira] [Resolved] (SPARK-4099) env var HOME not set correctly
[ https://issues.apache.org/jira/browse/SPARK-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-4099.
------------------------------
Resolution: Not A Problem
> env var HOME not set correctly
> ------------------------------
>
> Key: SPARK-4099
> URL: https://issues.apache.org/jira/browse/SPARK-4099
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.1.0
> Reporter: Radim Rehurek
> Priority: Minor
>
> The HOME environment var is not set properly, in PySpark jobs. For example, when setting up a Spark cluster on AWS, `os.environ["HOME"]` gives "/home", rather than the correct "/home/hadoop".
> One consequence is that some Python packages don't work (including NLTK). This is because they rely on HOME to work properly, as they store some internal data there.
> I assume this problem is to do with the way Spark launches the job processes (no shell).
> Fix is simple: users have to manually set `os.environ["HOME"]`, before importing said packages.
> But it's pretty non-intuitive and maybe hard to figure out for some users. I think it's better to set HOME directly on Spark side. This will make NLTK (and others) work out of the box.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org