You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Everett Anderson <ev...@nuna.com.INVALID> on 2016/10/24 23:27:08 UTC

Modifying Metadata in StructType schemas

Hi,

I've been using the immutable Metadata within the StructType of a
DataFrame/Dataset to track application-level column lineage.

However, since it's immutable, the only way to modify it is to do a full
trip of

   1. Convert DataFrame/Dataset to Row RDD
   2. Create new, modified Metadata per column from the old
   3. Create a new StructType with the modified metadata
   4. Convert the Row RDD + StructType schema to a DataFrame/Dataset

It looks like conversion to/from an RDD might involve real work, even
though in this case the data itself isn't modified at all.

Is there a better way to do this?

Thanks!