You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Everett Anderson <ev...@nuna.com.INVALID> on 2016/10/24 23:27:08 UTC
Modifying Metadata in StructType schemas
Hi,
I've been using the immutable Metadata within the StructType of a
DataFrame/Dataset to track application-level column lineage.
However, since it's immutable, the only way to modify it is to do a full
trip of
1. Convert DataFrame/Dataset to Row RDD
2. Create new, modified Metadata per column from the old
3. Create a new StructType with the modified metadata
4. Convert the Row RDD + StructType schema to a DataFrame/Dataset
It looks like conversion to/from an RDD might involve real work, even
though in this case the data itself isn't modified at all.
Is there a better way to do this?
Thanks!