You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yanbo Liang <yb...@gmail.com> on 2016/07/02 08:30:16 UTC
Re: Several questions about how pyspark.ml works

Hi Nick,

Please see my inline reply.

Thanks
Yanbo

2016-06-12 3:08 GMT-07:00 XapaJIaMnu <nh...@gmail.com>:

> Hey,
>
> I have some additional Spark ML algorithms implemented in scala that I
> would
> like to make available in pyspark. For a reference I am looking at the
> available logistic regression implementation here:
>
>
> https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/ml/classification.html
>
> I have couple of questions:
> 1) The constructor for the *class LogisticRegression* as far as I
> understand
> just accepts the arguments and then just constructs the underlying Scala
> object via /py4j/ and parses its arguments. This is done via the line
> *self._java_obj = self._new_java_obj(
> "org.apache.spark.ml.classification.LogisticRegression", self.uid)*
> Is this correct?
> What does the line *super(LogisticRegression, self).__init__()* do?
>

*super(LogisticRegression, self).__init__()* is used to initialize the
*Params* object at Python side, since we store all params at Python side
and transfer them to Scala side when calling *fit*.


>
> Does this mean that any python datastructures used with it will be
> converted
> to java structures once the object is instantiated?
>
> 2) The corresponding model *class LogisticRegressionModel(JavaModel):*
> again
> just instantiates the Java object and nothing else? Is just enough for me
> to
> forward the arguments and instantiate the scala objects?
> Does this mean that when the pipeline is created, even if the pipeline is
> python it expects objects which are underlying scala code instantiated by
> /py4j/. Can one use pure python elements inside the pipeline (dealing with
> RDDs)? What would be the performance implication?
>

*class LogisticRegressionModel(JavaModel)* is only a wrapper of the peer
Scala model object.


>
> Cheers,
>
> Nick
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Several-questions-about-how-pyspark-ml-works-tp27141.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>