You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sunita <su...@gmail.com> on 2016/07/13 19:23:10 UTC

Re: Severe Spark Streaming performance degradation after upgrading to 1.6.1

I am facing the same issue. Upgrading to Spark1.6 is causing hugh performance
loss. Could you solve this issue? I am also attempting memory settings as
mentioned
http://spark.apache.org/docs/latest/configuration.html#memory-management

But its not making a lot of difference. Appreciate your inputs on this



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Severe-Spark-Streaming-performance-degradation-after-upgrading-to-1-6-1-tp27056p27330.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Severe Spark Streaming performance degradation after upgrading to 1.6.1

Posted by Dibyendu Bhattacharya <di...@gmail.com>.

You can get some good pointers in this JIRA

https://issues.apache.org/jira/browse/SPARK-15796

Dibyendu


On Thu, Jul 14, 2016 at 12:53 AM, Sunita <su...@gmail.com> wrote:

> I am facing the same issue. Upgrading to Spark1.6 is causing hugh
> performance
> loss. Could you solve this issue? I am also attempting memory settings as
> mentioned
> http://spark.apache.org/docs/latest/configuration.html#memory-management
>
> But its not making a lot of difference. Appreciate your inputs on this
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Severe-Spark-Streaming-performance-degradation-after-upgrading-to-1-6-1-tp27056p27330.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Severe Spark Streaming performance degradation after upgrading to 1.6.1

Posted by Sunita Arvind <su...@gmail.com>.

Thank you for your inputs. Will test it out and share my findings



On Thursday, July 14, 2016, CosminC <ci...@adobe.com> wrote:

> Didn't have the time to investigate much further, but the one thing that
> popped out is that partitioning was no longer working on 1.6.1. This would
> definitely explain the 2x performance loss.
>
> Checking 1.5.1 Spark logs for the same application showed that our
> partitioner was working correctly, and after the DStream / RDD creation a
> user session was only processed on a single machine. Running on top of
> 1.6.1
> though, the session was processed on up to 4 machines, in a 5 node cluster
> including the driver, with a lot of redundant operations. We use a custom
> but very simple partitioner which extends HashPartitioner. It partitions on
> a case class which has a single string parameter.
>
> Speculative operations are turned off by default, and we never enabled it,
> so it's not that.
>
> Right now we're postponing any Spark upgrade, and we'll probably try to
> upgrade directly to Spark 2.0, hoping the partitioning issue is no longer
> present there.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Severe-Spark-Streaming-performance-degradation-after-upgrading-to-1-6-1-tp27056p27334.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <javascript:;>
>
>

Re: Severe Spark Streaming performance degradation after upgrading to 1.6.1

Posted by CosminC <ci...@adobe.com>.

Didn't have the time to investigate much further, but the one thing that
popped out is that partitioning was no longer working on 1.6.1. This would
definitely explain the 2x performance loss.

Checking 1.5.1 Spark logs for the same application showed that our
partitioner was working correctly, and after the DStream / RDD creation a
user session was only processed on a single machine. Running on top of 1.6.1
though, the session was processed on up to 4 machines, in a 5 node cluster
including the driver, with a lot of redundant operations. We use a custom
but very simple partitioner which extends HashPartitioner. It partitions on
a case class which has a single string parameter.

Speculative operations are turned off by default, and we never enabled it,
so it's not that.

Right now we're postponing any Spark upgrade, and we'll probably try to
upgrade directly to Spark 2.0, hoping the partitioning issue is no longer
present there.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Severe-Spark-Streaming-performance-degradation-after-upgrading-to-1-6-1-tp27056p27334.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org