You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by saatvikshah1994 <sa...@gmail.com> on 2017/07/18 00:49:44 UTC

Spark UI crashes on Large Workloads

Hi,

I have a pyspark App which when provided a huge amount of data as input
throws the error explained here sometimes:
https://stackoverflow.com/questions/32340639/unable-to-understand-error-sparklistenerbus-has-already-stopped-dropping-event.
All my code is running inside the main function, and the only slightly
peculiar thing I am doing in this app is using a custom PySpark ML
Transformer(Modified from
https://stackoverflow.com/questions/32331848/create-a-custom-transformer-in-pyspark-ml).
Could this be the issue? How can I debug why this is happening?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark UI crashes on Large Workloads

Posted by Saatvik Shah <sa...@gmail.com>.
Hi Riccardo,

Thanks for your suggestions.
The thing is that my Spark UI is the one thing that is crashing - and not
the app. In fact the app does end up completing successfully.
That's why I'm a bit confused by this issue?
I'll still try out some of your suggestions.
Thanks and Regards,
Saatvik Shah


On Tue, Jul 18, 2017 at 9:59 AM, Riccardo Ferrari <fe...@gmail.com>
wrote:

> The reason you get connection refused when connecting to the application
> UI (port 4040) is because you app gets stopped thus the application UI
> stops as well. To inspect your executors logs after the fact you might find
> useful the Spark History server
> <https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact>
> (for standalone mode).
>
> Personally I I collect the logs from my worker nodes. They generally sit
> under the $SPARK_HOME/work/<app-id>/<executor-number> (for standalone).
> There you can find exceptions and messages from the executors assigned to
> your app.
>
> Now, about you app crashing, might be useful check whether it is sized
> correctly. The issue you linked sounds appropriate however I would give
> some sanity checks a try. I solved many issues just by sizing an app that I
> would first check memory size, cpu allocations and so on..
>
> Best,
>
> On Tue, Jul 18, 2017 at 3:30 PM, Saatvik Shah <sa...@gmail.com>
> wrote:
>
>> Hi Riccardo,
>>
>> Yes, Thanks for suggesting I do that.
>>
>> [Stage 1:==========================================>       (12750 + 40)
>> / 15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus:
>> Dropping SparkListenerEvent because no remaining room in event queue. This
>> likely means one of the SparkListeners is too slow and cannot keep up with
>> the rate at which tasks are being started by the scheduler.
>> 17/07/18 13:22:28 WARN org.apache.spark.scheduler.LiveListenerBus:
>> Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
>> [Stage 1:============================================>     (13320 + 41)
>> / 15000]17/07/18 13:23:28 WARN org.apache.spark.scheduler.LiveListenerBus:
>> Dropped 26782 SparkListenerEvents since Tue Jul 18 13:22:28 UTC 2017
>> [Stage 1:==============================================>   (13867 + 40)
>> / 15000]17/07/18 13:24:28 WARN org.apache.spark.scheduler.LiveListenerBus:
>> Dropped 58751 SparkListenerEvents since Tue Jul 18 13:23:28 UTC 2017
>> [Stage 1:===============================================>  (14277 + 40)
>> / 15000]17/07/18 13:25:10 INFO org.spark_project.jetty.server.ServerConnector:
>> Stopped ServerConnector@3b7284c4{HTTP/1.1}{0.0.0.0:4040}
>> 17/07/18 13:25:10 ERROR org.apache.spark.scheduler.LiveListenerBus:
>> SparkListenerBus has already stopped! Dropping event
>> SparkListenerExecutorMetricsUpdate(4,WrappedArray())
>> And similar WARN/INFO messages continue occurring.
>>
>> When I try to access the UI, I get:
>>
>> Problem accessing /proxy/application_1500380353993_0001/. Reason:
>>
>>     Connection to http://10.142.0.17:4040 refused
>>
>> Caused by:
>>
>> org.apache.http.conn.HttpHostConnectException: Connection to http://10.142.0.17:4040 refused
>> 	at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
>> 	at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
>> 	at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
>> 	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
>> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>> 	at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200)
>> 	at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387)
>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>
>>
>>
>> I noticed this issue talks about something similar and I guess is
>> related: https://issues.apache.org/jira/browse/SPARK-18838.
>>
>> On Tue, Jul 18, 2017 at 2:49 AM, Riccardo Ferrari <fe...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>  can you share more details. do you have any exceptions from the driver?
>>> or executors?
>>>
>>> best,
>>>
>>> On Jul 18, 2017 02:49, "saatvikshah1994" <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a pyspark App which when provided a huge amount of data as input
>>>> throws the error explained here sometimes:
>>>> https://stackoverflow.com/questions/32340639/unable-to-under
>>>> stand-error-sparklistenerbus-has-already-stopped-dropping-event.
>>>> All my code is running inside the main function, and the only slightly
>>>> peculiar thing I am doing in this app is using a custom PySpark ML
>>>> Transformer(Modified from
>>>> https://stackoverflow.com/questions/32331848/create-a-custom
>>>> -transformer-in-pyspark-ml).
>>>> Could this be the issue? How can I debug why this is happening?
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-spark-user-list.
>>>> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>>
>>
>>
>> --
>> *Saatvik Shah,*
>> *Masters in the School of Computer Science,*
>> *Carnegie Mellon University,*
>> *LinkedIn <https://www.linkedin.com/in/saatvikshah/>, Website
>> <https://saatvikshah1994.github.io/>*
>>
>
>


-- 
*Saatvik Shah,*
*Masters in the School of Computer Science,*
*Carnegie Mellon University,*
*LinkedIn <https://www.linkedin.com/in/saatvikshah/>, Website
<https://saatvikshah1994.github.io/>*

Re: Spark UI crashes on Large Workloads

Posted by Riccardo Ferrari <fe...@gmail.com>.
The reason you get connection refused when connecting to the application UI
(port 4040) is because you app gets stopped thus the application UI stops
as well. To inspect your executors logs after the fact you might find
useful the Spark History server
<https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact>
(for standalone mode).

Personally I I collect the logs from my worker nodes. They generally sit
under the $SPARK_HOME/work/<app-id>/<executor-number> (for standalone).
There you can find exceptions and messages from the executors assigned to
your app.

Now, about you app crashing, might be useful check whether it is sized
correctly. The issue you linked sounds appropriate however I would give
some sanity checks a try. I solved many issues just by sizing an app that I
would first check memory size, cpu allocations and so on..

Best,

On Tue, Jul 18, 2017 at 3:30 PM, Saatvik Shah <sa...@gmail.com>
wrote:

> Hi Riccardo,
>
> Yes, Thanks for suggesting I do that.
>
> [Stage 1:==========================================>       (12750 + 40) /
> 15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus:
> Dropping SparkListenerEvent because no remaining room in event queue. This
> likely means one of the SparkListeners is too slow and cannot keep up with
> the rate at which tasks are being started by the scheduler.
> 17/07/18 13:22:28 WARN org.apache.spark.scheduler.LiveListenerBus:
> Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
> [Stage 1:============================================>     (13320 + 41) /
> 15000]17/07/18 13:23:28 WARN org.apache.spark.scheduler.LiveListenerBus:
> Dropped 26782 SparkListenerEvents since Tue Jul 18 13:22:28 UTC 2017
> [Stage 1:==============================================>   (13867 + 40) /
> 15000]17/07/18 13:24:28 WARN org.apache.spark.scheduler.LiveListenerBus:
> Dropped 58751 SparkListenerEvents since Tue Jul 18 13:23:28 UTC 2017
> [Stage 1:===============================================>  (14277 + 40) /
> 15000]17/07/18 13:25:10 INFO org.spark_project.jetty.server.ServerConnector:
> Stopped ServerConnector@3b7284c4{HTTP/1.1}{0.0.0.0:4040}
> 17/07/18 13:25:10 ERROR org.apache.spark.scheduler.LiveListenerBus:
> SparkListenerBus has already stopped! Dropping event
> SparkListenerExecutorMetricsUpdate(4,WrappedArray())
> And similar WARN/INFO messages continue occurring.
>
> When I try to access the UI, I get:
>
> Problem accessing /proxy/application_1500380353993_0001/. Reason:
>
>     Connection to http://10.142.0.17:4040 refused
>
> Caused by:
>
> org.apache.http.conn.HttpHostConnectException: Connection to http://10.142.0.17:4040 refused
> 	at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
> 	at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
> 	at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
> 	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> 	at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200)
> 	at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>
>
>
> I noticed this issue talks about something similar and I guess is related:
> https://issues.apache.org/jira/browse/SPARK-18838.
>
> On Tue, Jul 18, 2017 at 2:49 AM, Riccardo Ferrari <fe...@gmail.com>
> wrote:
>
>> Hi,
>>  can you share more details. do you have any exceptions from the driver?
>> or executors?
>>
>> best,
>>
>> On Jul 18, 2017 02:49, "saatvikshah1994" <sa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a pyspark App which when provided a huge amount of data as input
>>> throws the error explained here sometimes:
>>> https://stackoverflow.com/questions/32340639/unable-to-under
>>> stand-error-sparklistenerbus-has-already-stopped-dropping-event.
>>> All my code is running inside the main function, and the only slightly
>>> peculiar thing I am doing in this app is using a custom PySpark ML
>>> Transformer(Modified from
>>> https://stackoverflow.com/questions/32331848/create-a-custom
>>> -transformer-in-pyspark-ml).
>>> Could this be the issue? How can I debug why this is happening?
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>
>
> --
> *Saatvik Shah,*
> *Masters in the School of Computer Science,*
> *Carnegie Mellon University,*
> *LinkedIn <https://www.linkedin.com/in/saatvikshah/>, Website
> <https://saatvikshah1994.github.io/>*
>

Re: Spark UI crashes on Large Workloads

Posted by Saatvik Shah <sa...@gmail.com>.
Hi Riccardo,

Yes, Thanks for suggesting I do that.

[Stage 1:==========================================>       (12750 + 40) /
15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus:
Dropping SparkListenerEvent because no remaining room in event queue. This
likely means one of the SparkListeners is too slow and cannot keep up with
the rate at which tasks are being started by the scheduler.
17/07/18 13:22:28 WARN org.apache.spark.scheduler.LiveListenerBus: Dropped
1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
[Stage 1:============================================>     (13320 + 41) /
15000]17/07/18 13:23:28 WARN org.apache.spark.scheduler.LiveListenerBus:
Dropped 26782 SparkListenerEvents since Tue Jul 18 13:22:28 UTC 2017
[Stage 1:==============================================>   (13867 + 40) /
15000]17/07/18 13:24:28 WARN org.apache.spark.scheduler.LiveListenerBus:
Dropped 58751 SparkListenerEvents since Tue Jul 18 13:23:28 UTC 2017
[Stage 1:===============================================>  (14277 + 40) /
15000]17/07/18 13:25:10 INFO
org.spark_project.jetty.server.ServerConnector: Stopped
ServerConnector@3b7284c4{HTTP/1.1}{0.0.0.0:4040}
17/07/18 13:25:10 ERROR org.apache.spark.scheduler.LiveListenerBus:
SparkListenerBus has already stopped! Dropping event
SparkListenerExecutorMetricsUpdate(4,WrappedArray())
And similar WARN/INFO messages continue occurring.

When I try to access the UI, I get:

Problem accessing /proxy/application_1500380353993_0001/. Reason:

    Connection to http://10.142.0.17:4040 refused

Caused by:

org.apache.http.conn.HttpHostConnectException: Connection to
http://10.142.0.17:4040 refused
	at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
	at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
	at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
	at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200)
	at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)



I noticed this issue talks about something similar and I guess is related:
https://issues.apache.org/jira/browse/SPARK-18838.

On Tue, Jul 18, 2017 at 2:49 AM, Riccardo Ferrari <fe...@gmail.com>
wrote:

> Hi,
>  can you share more details. do you have any exceptions from the driver?
> or executors?
>
> best,
>
> On Jul 18, 2017 02:49, "saatvikshah1994" <sa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a pyspark App which when provided a huge amount of data as input
>> throws the error explained here sometimes:
>> https://stackoverflow.com/questions/32340639/unable-to-under
>> stand-error-sparklistenerbus-has-already-stopped-dropping-event.
>> All my code is running inside the main function, and the only slightly
>> peculiar thing I am doing in this app is using a custom PySpark ML
>> Transformer(Modified from
>> https://stackoverflow.com/questions/32331848/create-a-custom
>> -transformer-in-pyspark-ml).
>> Could this be the issue? How can I debug why this is happening?
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>


-- 
*Saatvik Shah,*
*Masters in the School of Computer Science,*
*Carnegie Mellon University,*
*LinkedIn <https://www.linkedin.com/in/saatvikshah/>, Website
<https://saatvikshah1994.github.io/>*

Re: Spark UI crashes on Large Workloads

Posted by Riccardo Ferrari <fe...@gmail.com>.
Hi,
 can you share more details. do you have any exceptions from the driver? or
executors?

best,

On Jul 18, 2017 02:49, "saatvikshah1994" <sa...@gmail.com> wrote:

> Hi,
>
> I have a pyspark App which when provided a huge amount of data as input
> throws the error explained here sometimes:
> https://stackoverflow.com/questions/32340639/unable-to-understand-error-
> sparklistenerbus-has-already-stopped-dropping-event.
> All my code is running inside the main function, and the only slightly
> peculiar thing I am doing in this app is using a custom PySpark ML
> Transformer(Modified from
> https://stackoverflow.com/questions/32331848/create-a-
> custom-transformer-in-pyspark-ml).
> Could this be the issue? How can I debug why this is happening?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>