You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Qingsheng Ren <re...@gmail.com> on 2020/04/09 08:36:35 UTC

[Spark MLlib]: Multiple input dataframes and non-linear ML pipeline

Hi all,

I'm using ML Pipeline to construct a flow of transformation. I'm wondering
if it is possible to set multiple dataframes as the input of a transformer?
For example I need to join two dataframes together in a transformer, then
feed into the estimator for training. If not, is there any plan to support
this in the future?

Another question is about non-linear pipeline. Since we can randomly assign
input and output column of a pipeline stage, what will happen if I build a
problematic DAG (like a circular one)? Is there any mechanism to prevent
this from happening?

Thanks~

Qingsheng (Patrick) Ren