You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Paul Tremblay <pa...@gmail.com> on 2017/03/31 00:45:59 UTC

Looking at EMR Logs

I am looking for tips on evaluating my Spark job after it has run.

I know that right now I can look at the history of jobs through the web ui.
I also know how to look at the current resources being used by a similar
web ui.

However, I would like to look at the logs after the job is finished to
evaluate such things as how many tasks were completed, how many executors
were used, etc. I currently save my logs to S3.

Thanks!

Henry

-- 
Paul Henry Tremblay
Robert Half Technology

Re: Looking at EMR Logs

Posted by Paul Tremblay <pa...@gmail.com>.
Thanks. That seems to work great, except EMR doesn't always copy the logs
to S3. The behavior  seems inconsistent and I am debugging it now.

On Fri, Mar 31, 2017 at 7:46 AM, Vadim Semenov <va...@datadoghq.com>
wrote:

> You can provide your own log directory, where Spark log will be saved, and
> that you could replay afterwards.
>
> Set in your job this: `spark.eventLog.dir=s3://bucket/some/directory` and
> run it.
> Note! The path `s3://bucket/some/directory` must exist before you run your
> job, it'll not be created automatically.
>
> The Spark HistoryServer on EMR won't show you anything because it's
> looking for logs in `hdfs:///var/log/spark/apps` by default.
>
> After that you can either copy the log files from s3 to the hdfs path
> above, or you can copy them locally to `/tmp/spark-events` (the default
> directory for spark logs) and run the history server like:
> ```
> cd /usr/local/src/spark-1.6.1-bin-hadoop2.6
> sbin/start-history-server.sh
> ```
> and then open http://localhost:18080
>
>
>
>
> On Thu, Mar 30, 2017 at 8:45 PM, Paul Tremblay <pa...@gmail.com>
> wrote:
>
>> I am looking for tips on evaluating my Spark job after it has run.
>>
>> I know that right now I can look at the history of jobs through the web
>> ui. I also know how to look at the current resources being used by a
>> similar web ui.
>>
>> However, I would like to look at the logs after the job is finished to
>> evaluate such things as how many tasks were completed, how many executors
>> were used, etc. I currently save my logs to S3.
>>
>> Thanks!
>>
>> Henry
>>
>> --
>> Paul Henry Tremblay
>> Robert Half Technology
>>
>
>


-- 
Paul Henry Tremblay
Robert Half Technology

Re: Looking at EMR Logs

Posted by Neil Jonkers <ne...@gmail.com>.
Modifying spark.eventLog.dir to point to a S3 path, you will encounter the
following exception in Spark history log on path:
/var/log/spark/spark-history-server.out


Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2702)

To move past this issue, we can do the following. This is for EMR Release:
emr-5.4.0

cd /usr/lib/spark/jars
sudo ln -s /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-2.15.0.jar
emrfs.jar

Now Spark history server will startup correctly and you can review the
Spark event logs on S3.


On Fri, Mar 31, 2017 at 4:46 PM, Vadim Semenov <va...@datadoghq.com>
wrote:

> You can provide your own log directory, where Spark log will be saved, and
> that you could replay afterwards.
>
> Set in your job this: `spark.eventLog.dir=s3://bucket/some/directory` and
> run it.
> Note! The path `s3://bucket/some/directory` must exist before you run your
> job, it'll not be created automatically.
>
> The Spark HistoryServer on EMR won't show you anything because it's
> looking for logs in `hdfs:///var/log/spark/apps` by default.
>
> After that you can either copy the log files from s3 to the hdfs path
> above, or you can copy them locally to `/tmp/spark-events` (the default
> directory for spark logs) and run the history server like:
> ```
> cd /usr/local/src/spark-1.6.1-bin-hadoop2.6
> sbin/start-history-server.sh
> ```
> and then open http://localhost:18080
>
>
>
>
> On Thu, Mar 30, 2017 at 8:45 PM, Paul Tremblay <pa...@gmail.com>
> wrote:
>
>> I am looking for tips on evaluating my Spark job after it has run.
>>
>> I know that right now I can look at the history of jobs through the web
>> ui. I also know how to look at the current resources being used by a
>> similar web ui.
>>
>> However, I would like to look at the logs after the job is finished to
>> evaluate such things as how many tasks were completed, how many executors
>> were used, etc. I currently save my logs to S3.
>>
>> Thanks!
>>
>> Henry
>>
>> --
>> Paul Henry Tremblay
>> Robert Half Technology
>>
>
>

Re: Looking at EMR Logs

Posted by Vadim Semenov <va...@datadoghq.com>.
You can provide your own log directory, where Spark log will be saved, and
that you could replay afterwards.

Set in your job this: `spark.eventLog.dir=s3://bucket/some/directory` and
run it.
Note! The path `s3://bucket/some/directory` must exist before you run your
job, it'll not be created automatically.

The Spark HistoryServer on EMR won't show you anything because it's looking
for logs in `hdfs:///var/log/spark/apps` by default.

After that you can either copy the log files from s3 to the hdfs path
above, or you can copy them locally to `/tmp/spark-events` (the default
directory for spark logs) and run the history server like:
```
cd /usr/local/src/spark-1.6.1-bin-hadoop2.6
sbin/start-history-server.sh
```
and then open http://localhost:18080




On Thu, Mar 30, 2017 at 8:45 PM, Paul Tremblay <pa...@gmail.com>
wrote:

> I am looking for tips on evaluating my Spark job after it has run.
>
> I know that right now I can look at the history of jobs through the web
> ui. I also know how to look at the current resources being used by a
> similar web ui.
>
> However, I would like to look at the logs after the job is finished to
> evaluate such things as how many tasks were completed, how many executors
> were used, etc. I currently save my logs to S3.
>
> Thanks!
>
> Henry
>
> --
> Paul Henry Tremblay
> Robert Half Technology
>