You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/20 16:57:40 UTC

[jira] [Resolved] (SPARK-4099) env var HOME not set correctly

     [ https://issues.apache.org/jira/browse/SPARK-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-4099.
------------------------------
    Resolution: Not A Problem

> env var HOME not set correctly
> ------------------------------
>
>                 Key: SPARK-4099
>                 URL: https://issues.apache.org/jira/browse/SPARK-4099
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.1.0
>            Reporter: Radim Rehurek
>            Priority: Minor
>
> The HOME environment var is not set properly, in PySpark jobs. For example, when setting up a Spark cluster on AWS, `os.environ["HOME"]` gives "/home", rather than the correct "/home/hadoop".
> One consequence is that some Python packages don't work (including NLTK). This is because they rely on HOME to work properly, as they store some internal data there.
> I assume this problem is to do with the way Spark launches the job processes (no shell).
> Fix is simple: users have to manually set `os.environ["HOME"]`, before importing said packages.
> But it's pretty non-intuitive and maybe hard to figure out for some users. I think it's better to set HOME directly on Spark side. This will make NLTK (and others) work out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org