You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Everett Rush (JIRA)" <ji...@apache.org> on 2019/03/22 17:22:00 UTC

[jira] [Created] (SPARK-27249) Developers API for Transformers beyond UnaryTransformer

Everett Rush created SPARK-27249:
------------------------------------

             Summary: Developers API for Transformers beyond UnaryTransformer
                 Key: SPARK-27249
                 URL: https://issues.apache.org/jira/browse/SPARK-27249
             Project: Spark
          Issue Type: New Feature
          Components: ML
    Affects Versions: 2.5.0
            Reporter: Everett Rush


It would be nice to have a developers' API for dataframe transformers that need more than one column from a row(ie UnaryTransformer) or that contain objects too expensive to initialize repeatedly in a UDF such as a database connection. 

 

Design:

Abstract class PartitionTransformer extends Transformer and defines the partition transformation function as Iterator[Row] => Iterator[Row]

NB: This parallels the UnaryTransformer createTransformFunc method

 

When developers subclass this transformer, they can either provide their own schema for the output Row or set output Datatype and output col name and the PartitionTransformer class will create a new schema and a row encoder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org