You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chris Thomas <he...@gmail.com> on 2020/05/25 17:55:57 UTC

Fwd: Spark API and immutability

The cache() method on the DataFrame API caught me out.

Having learnt that DataFrames are built on RDDs and that RDDs are
immutable, when I saw the statement df.cache() in our codebase I thought
‘This must be a bug, the result is not assigned, the statement will have no
affect.’

However, I’ve since learnt that the cache method actually mutates the
DataFrame object*. The statement was valid after all.

I understand that the underlying user data is immutable, but doesn’t
mutating the DataFrame object make the API a little inconsistent and harder
to reason about?

Regards

Chris


* (as does persist and rdd.setName methods. I expect there are others)

Re: Spark API and immutability

Posted by Holden Karau <ho...@pigscanfly.ca>.
So even on RDDs cache/persist mutate the RDD object. The important thing
for Spark is that the data  represented/in the RDD/Dataframe isn’t mutated.

On Mon, May 25, 2020 at 10:56 AM Chris Thomas <he...@gmail.com>
wrote:

>
> The cache() method on the DataFrame API caught me out.
>
> Having learnt that DataFrames are built on RDDs and that RDDs are
> immutable, when I saw the statement df.cache() in our codebase I thought
> ‘This must be a bug, the result is not assigned, the statement will have no
> affect.’
>
> However, I’ve since learnt that the cache method actually mutates the
> DataFrame object*. The statement was valid after all.
>
> I understand that the underlying user data is immutable, but doesn’t
> mutating the DataFrame object make the API a little inconsistent and harder
> to reason about?
>
> Regards
>
> Chris
>
>
> * (as does persist and rdd.setName methods. I expect there are others)
>
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau