You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Prarthi Jain <pr...@gmail.com> on 2022/11/20 18:20:02 UTC

Spark performance on small dataset

Hi Everyone,

Spark and the RDD approach it favors assumes that most applications run on
big data and need massive parallelism via sharding and concurrent
computing. But some tasks run on small data and do not need or benefit from
RDD parallelism. How are these tasks expected to perform on Spark?

Looking forward to more insights on this!

Thanks,
Prarthi