You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Sahm Stephan <s....@reply.de> on 2017/12/01 10:35:22 UTC

private methods in mllib

Dear spark developers,

just tried to modify the MultivariateGaussian from org.apache.spark.mllib.stat.distribution and failed miserably because there are so many things made private.

Why? What would be a good way to deal with this? Rewrite everything?

thanks a lot,
best,
Stephan


Stephan Sahm

Data Reply
a Reply AG company

Luise-Ullrich-Straße 14
80636 - München - GERMANY
phone: +49 89 411142-0
mobile: +49 151 19567092
s.sahm@reply.de<ma...@reply.de>
www.reply.de

[Data Reply]

________________________________

Data Reply GmbH
Sitz/Registered Office: München
Handelsregister/Register of Companies: Amtsgericht München, HRB 219581
Geschäftsführer/Managing Directors: Michele Giordano, Nikolaos Radouniklis, Daniel Wajngarten

RE: private methods in mllib

Posted by Sahm Stephan <s....@reply.de>.
I just want things to get running and sometimes it is the easiest and fastest to adapt the code a bit here and there.
Scala implicits are lovely to do so, however, if things are private there is no way to add extra functionality.

Currently I am just copying the spark classes which I need access to, which looks a bit unwanted.



Stephan Sahm

Data Reply
a Reply AG company

Luise-Ullrich-Straße 14
80636 - München - GERMANY
phone: +49 89 411142-0
mobile: +49 151 19567092
s.sahm@reply.de<ma...@reply.de>
www.reply.de

[Data Reply]
________________________________
From: holden.karau@gmail.com [holden.karau@gmail.com] on behalf of Holden Karau [holden@pigscanfly.ca]
Sent: 01 December 2017 19:33
To: Jacek Laskowski
Cc: Sahm Stephan; dev@spark.apache.org
Subject: Re: private methods in mllib

So I want to be clear, many of these things are private in org.apache.spark.ml<http://org.apache.spark.ml> as well. The implementation details of the algorithms may change, so changing parts of the internals isn't easily supported. What are you trying to change or add?

On Fri, Dec 1, 2017 at 10:29 AM, Jacek Laskowski <ja...@japila.pl>> wrote:
Hi Sahm,

Unless I'm mistaken [1], but org.apache.spark.mllib is put on hold and is considered @deprecated these days. That'd explain why "so many things made private".

[1] https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/package.scala#L21

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Fri, Dec 1, 2017 at 11:35 AM, Sahm Stephan <s....@reply.de>> wrote:
Dear spark developers,

just tried to modify the MultivariateGaussian from org.apache.spark.mllib.stat.distribution and failed miserably because there are so many things made private.

Why? What would be a good way to deal with this? Rewrite everything?

thanks a lot,
best,
Stephan


Stephan Sahm

Data Reply
a Reply AG company

Luise-Ullrich-Straße 14<https://maps.google.com/?q=Luise-Ullrich-Stra%C3%9Fe+14+%0D+80636+-+M%C3%BCnchen+-+GERMANY&entry=gmail&source=g>
80636 - München - GERMANY
phone: +49 89 411142-0
mobile: +49 151 19567092<tel:+49%201511%209567092>
s.sahm@reply.de<ma...@reply.de>
www.reply.de<http://www.reply.de>

[Data Reply]

________________________________

Data Reply GmbH
Sitz/Registered Office: München
Handelsregister/Register of Companies: Amtsgericht München, HRB 219581
Geschäftsführer/Managing Directors: Michele Giordano, Nikolaos Radouniklis, Daniel Wajngarten




--
Twitter: https://twitter.com/holdenkarau

________________________________

Data Reply GmbH
Sitz/Registered Office: München
Handelsregister/Register of Companies: Amtsgericht München, HRB 219581
Geschäftsführer/Managing Directors: Michele Giordano, Nikolaos Radouniklis, Daniel Wajngarten

Re: private methods in mllib

Posted by Holden Karau <ho...@pigscanfly.ca>.
So I want to be clear, many of these things are private in
org.apache.spark.ml as well. The implementation details of the algorithms
may change, so changing parts of the internals isn't easily supported. What
are you trying to change or add?

On Fri, Dec 1, 2017 at 10:29 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi Sahm,
>
> Unless I'm mistaken [1], but org.apache.spark.mllib is put on hold and is
> considered @deprecated these days. That'd explain why "so many things made
> private".
>
> [1] https://github.com/apache/spark/blob/master/mllib/src/
> main/scala/org/apache/spark/mllib/package.scala#L21
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> On Fri, Dec 1, 2017 at 11:35 AM, Sahm Stephan <s....@reply.de> wrote:
>
>> Dear spark developers,
>>
>> just tried to modify the MultivariateGaussian from
>> org.apache.spark.mllib.stat.distribution and failed miserably because
>> there are so many things made private.
>>
>> Why? What would be a good way to deal with this? Rewrite everything?
>>
>> thanks a lot,
>> best,
>> Stephan
>>
>>
>> Stephan Sahm
>>
>> Data Reply
>> a Reply AG company
>>
>> Luise-Ullrich-Straße 14
>> <https://maps.google.com/?q=Luise-Ullrich-Stra%C3%9Fe+14+%0D+80636+-+M%C3%BCnchen+-+GERMANY&entry=gmail&source=g>
>> 80636 - München - GERMANY
>> phone: +49 89 411142-0
>> mobile: +49 151 19567092 <+49%201511%209567092>
>> s.sahm@reply.de
>> www.reply.de
>>
>> [image: Data Reply]
>>
>> ------------------------------
>>
>> Data Reply GmbH
>> Sitz/Registered Office: München
>> Handelsregister/Register of Companies: Amtsgericht München, HRB 219581
>> Geschäftsführer/Managing Directors: Michele Giordano, Nikolaos
>> Radouniklis, Daniel Wajngarten
>>
>
>


-- 
Twitter: https://twitter.com/holdenkarau

Re: private methods in mllib

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi Sahm,

Unless I'm mistaken [1], but org.apache.spark.mllib is put on hold and is
considered @deprecated these days. That'd explain why "so many things made
private".

[1]
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/package.scala#L21

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Fri, Dec 1, 2017 at 11:35 AM, Sahm Stephan <s....@reply.de> wrote:

> Dear spark developers,
>
> just tried to modify the MultivariateGaussian from
> org.apache.spark.mllib.stat.distribution and failed miserably because
> there are so many things made private.
>
> Why? What would be a good way to deal with this? Rewrite everything?
>
> thanks a lot,
> best,
> Stephan
>
>
> Stephan Sahm
>
> Data Reply
> a Reply AG company
>
> Luise-Ullrich-Straße 14
> <https://maps.google.com/?q=Luise-Ullrich-Stra%C3%9Fe+14+%0D+80636+-+M%C3%BCnchen+-+GERMANY&entry=gmail&source=g>
> 80636 - München - GERMANY
> phone: +49 89 411142-0
> mobile: +49 151 19567092 <+49%201511%209567092>
> s.sahm@reply.de
> www.reply.de
>
> [image: Data Reply]
>
> ------------------------------
>
> Data Reply GmbH
> Sitz/Registered Office: München
> Handelsregister/Register of Companies: Amtsgericht München, HRB 219581
> Geschäftsführer/Managing Directors: Michele Giordano, Nikolaos
> Radouniklis, Daniel Wajngarten
>