You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by puneetloya <pu...@gmail.com> on 2019/06/16 05:08:16 UTC

Spark 2.4.3 - Structured Streaming - high on Storage Memory

Hi,

Just upgraded Spark from 2.2.3 to 2.4.3. 
<http://apache-spark-user-list.1001560.n3.nabble.com/file/t8825/spark-screen-shot.png> 

Ran a load test with a week worth of messages in kafka. Seeing an odd
behavior, why is the storage memory so high? Have run similar workloads with
Spark 2.2.3, have never seen such behavior. Has something pretty basic about
Spark has changed?

Our main changes for 2.4.3:
1) We started using Cassandra Sink Supported in Spark 2.4
2) Moved to Hadoop 3.1.1 from Hadoop 2.7.3. Mainly because we use s3
checkpointing and AWS SDK for 2.7.3 does not have a fix for connection
retries to s3 storage?

Thanks,
Puneet



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark 2.4.3 - Structured Streaming - high on Storage Memory

Posted by puneetloya <pu...@gmail.com>.

Hi the amazing spark team,

I was closely following these issues,

https://issues.apache.org/jira/browse/SPARK-27648
and then recently this:
https://issues.apache.org/jira/browse/SPARK-29055

Looks like all of it is fixed in this pull request:
https://github.com/apache/spark/pull/25973 and it was merged to master as
well as branch-2.4.

We have been really looking to use Spark 2.4 and this issue was causing
issues to upgrade from Spark 2.2 to Spark 2.4.

Is there a plan to release a Spark 2.4.5 or please can you consider
releasing Spark 2.4.5?

Thanks,
Puneet



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark 2.4.3 - Structured Streaming - high on Storage Memory

Posted by puneetloya <pu...@gmail.com>.

Just More info on the above post:

Have been seeing lot of these logs:

1) The state for version 15109(other numbers too) doesn't exist in
loadedMaps. Reading snapshot file and delta files if needed...Note that this
is normal for the first batch of starting query.

2) KafkaConsumer cache hitting max capacity of 64, removing consumer for
CacheKey



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org