You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andrew Lee <al...@hotmail.com> on 2014/07/28 21:40:26 UTC

Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

Hi All,
Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 when you specify the location in conf/spark-defaults.conf for
spark.eventLog.dir hdfs:///user/$USER/spark/logs
to use the $USER env variable. 
For example, I'm running the command with user 'test'.
In spark-submit, the folder will be created on-the-fly and you will see the event logs created on HDFS /user/test/spark/logs/spark-pi-1405097484152
but in spark-shell, the user 'test' folder is not created, and you will see this /user/$USER/spark/logs on HDFS. It will try to create /user/$USER/spark/logs instead of /user/test/spark/logs.
It looks like spark-shell couldn't pick up the env variable $USER to apply for the eventLog directory for the running user 'test'.
Is this considered a bug or bad practice to use spark-shell with Spark's HistoryServer?








 		 	   		  

RE: Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

Posted by Andrew Lee <al...@hotmail.com>.
Hi Andrew,
Thanks to re-confirm the problem. I thought it only happens to my own build. :)
by the way, we have multiple users using the spark-shell to explore their dataset, and we are continuously looking into ways to isolate their jobs history. In the current situation, we can't really ask them to create their own spark-defaults.conf since this is set to read-only. A workaround is to set it to a shared folder e.g. /user/spark/logs and user permission 1777. This isn't really ideal since other people can see what are the other jobs running on the shared cluster.
It will be nice to have a better security if this is enhanced so people aren't exposing their algorithm (which is usually embed in their job's name) to other users.
Will there or is there a JIRA ticket to keep track of this? any plan to enhance this part for spark-shell ?


Date: Mon, 28 Jul 2014 13:54:56 -0700
Subject: Re: Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir
From: andrew@databricks.com
To: user@spark.apache.org

Hi Andrew,
It's definitely not bad practice to use spark-shell with HistoryServer. The issue here is not with spark-shell, but the way we pass Spark configs to the application. spark-defaults.conf does not currently support embedding environment variables, but instead interprets everything as a string literal. You will have to manually specify "test" instead of "$USER" in the path you provide to spark.eventLog.dir.

-Andrew

2014-07-28 12:40 GMT-07:00 Andrew Lee <al...@hotmail.com>:




Hi All,
Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 when you specify the location in conf/spark-defaults.conf for

spark.eventLog.dir hdfs:///user/$USER/spark/logs
to use the $USER env variable. 

For example, I'm running the command with user 'test'.
In spark-submit, the folder will be created on-the-fly and you will see the event logs created on HDFS /user/test/spark/logs/spark-pi-1405097484152

but in spark-shell, the user 'test' folder is not created, and you will see this /user/$USER/spark/logs on HDFS. It will try to create /user/$USER/spark/logs instead of /user/test/spark/logs.

It looks like spark-shell couldn't pick up the env variable $USER to apply for the eventLog directory for the running user 'test'.

Is this considered a bug or bad practice to use spark-shell with Spark's HistoryServer?









 		 	   		  

 		 	   		  

Re: Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

Posted by Andrew Or <an...@databricks.com>.
Hi Andrew,

It's definitely not bad practice to use spark-shell with HistoryServer. The
issue here is not with spark-shell, but the way we pass Spark configs to
the application. spark-defaults.conf does not currently support embedding
environment variables, but instead interprets everything as a string
literal. You will have to manually specify "test" instead of "$USER" in the
path you provide to spark.eventLog.dir.

-Andrew


2014-07-28 12:40 GMT-07:00 Andrew Lee <al...@hotmail.com>:

> Hi All,
>
> Not sure if anyone has ran into this problem, but this exist in spark
> 1.0.0 when you specify the location in *conf/spark-defaults.conf* for
>
> spark.eventLog.dir hdfs:///user/$USER/spark/logs
>
> to use the *$USER* env variable.
>
> For example, I'm running the command with user 'test'.
>
> In *spark-submit*, the folder will be created on-the-fly and you will see
> the event logs created on HDFS
> */user/test/spark/logs/spark-pi-1405097484152*
>
> but in *spark-shell*, the user 'test' folder is not created, and you will
> see this */user/$USER/spark/logs* on HDFS. It will try to create
> */user/$USER/spark/logs* instead of */user/test/spark/logs*.
>
> It looks like spark-shell couldn't pick up the env variable $USER to apply
> for the eventLog directory for the running user 'test'.
>
> Is this considered a bug or bad practice to use spark-shell with Spark's
> HistoryServer?
>
>