You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Everett Rush (JIRA)" <ji...@apache.org> on 2019/03/22 17:22:00 UTC
[jira] [Created] (SPARK-27249) Developers API for Transformers
beyond UnaryTransformer
Everett Rush created SPARK-27249:
------------------------------------
Summary: Developers API for Transformers beyond UnaryTransformer
Key: SPARK-27249
URL: https://issues.apache.org/jira/browse/SPARK-27249
Project: Spark
Issue Type: New Feature
Components: ML
Affects Versions: 2.5.0
Reporter: Everett Rush
It would be nice to have a developers' API for dataframe transformers that need more than one column from a row(ie UnaryTransformer) or that contain objects too expensive to initialize repeatedly in a UDF such as a database connection.
Design:
Abstract class PartitionTransformer extends Transformer and defines the partition transformation function as Iterator[Row] => Iterator[Row]
NB: This parallels the UnaryTransformer createTransformFunc method
When developers subclass this transformer, they can either provide their own schema for the output Row or set output Datatype and output col name and the PartitionTransformer class will create a new schema and a row encoder.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org