You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Akhilanand <ak...@gmail.com> on 2019/02/22 00:35:20 UTC

Difference between Typed and untyped transformation in dataset API

What is the key difference between Typed and untyped transformation in
dataset API?
How do I determine if its typed or untyped?
Any gotchas when to use what apart from the reason that it does the job for
me?

RE: Difference between Typed and untyped transformation in dataset API

Posted by em...@yeikel.com.

From what I understand , if the transformation is untyped it will return a Dataframe , otherwise it will return a Dataset.  In the source code you will see that return type is a Dataframe instead of a Dataset and they should also be annotated with @group untypedrel. Thus , you could check the signature of the method to determine if it is untyped or not. 

 

In general , anything that changes the type of a column or adds a new column in a Dataset will be untyped. The idea of a Dataset is to stay constant when it comes to the schema. The moment you try to modify the schema , we need to fallback to a Dataframe. 

 

For example , withColumn is untyped because it transforms the Dataset(typed) to an untyped structure(Dataframe). 

 

From: Akhilanand <ak...@gmail.com> 
Sent: Thursday, February 21, 2019 7:35 PM
To: user <us...@spark.apache.org>
Subject: Difference between Typed and untyped transformation in dataset API

 

What is the key difference between Typed and untyped transformation in dataset API?

How do I determine if its typed or untyped?

Any gotchas when to use what apart from the reason that it does the job for me?