You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by act_coder <ac...@gmail.com> on 2021/02/12 06:16:41 UTC
Spark structured streaming with periodical persist and unpersist
I am currently building a spark structured streaming application where I am
doing a batch-stream join. And the source for the batch data gets updated
periodically.
So, I am planning to do a persist/unpersist of that batch data periodically.
Below is a sample code which I am using to persist and unpersist the batch
data.
Flow: -> Read the batch data -> persist the batch data -> For every one
hour, unpersist the data and read the batch data and persist it again.
But, I am not seeing the batch data getting refreshed for every hour.
Code:
var batchDF = handler.readBatchDF(sparkSession)
batchDF.persist(StorageLevel.MEMORY_AND_DISK)
var refreshedTime: Instant = Instant.now()
if (Duration.between(refreshedTime, Instant.now()).getSeconds > refreshTime)
{
refreshedTime = Instant.now()
batchDF.unpersist(false)
batchDF = handler.readBatchDF(sparkSession)
.persist(StorageLevel.MEMORY_AND_DISK)
}
Is there any better way to achieve this scenario in spark structured
streaming jobs ?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org