You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Sahm Stephan <s....@reply.de> on 2017/12/01 10:35:22 UTC
private methods in mllib
Dear spark developers,
just tried to modify the MultivariateGaussian from org.apache.spark.mllib.stat.distribution and failed miserably because there are so many things made private.
Why? What would be a good way to deal with this? Rewrite everything?
thanks a lot,
best,
Stephan
Stephan Sahm
Data Reply
a Reply AG company
Luise-Ullrich-Straße 14
80636 - München - GERMANY
phone: +49 89 411142-0
mobile: +49 151 19567092
s.sahm@reply.de<ma...@reply.de>
www.reply.de
[Data Reply]
________________________________
Data Reply GmbH
Sitz/Registered Office: München
Handelsregister/Register of Companies: Amtsgericht München, HRB 219581
Geschäftsführer/Managing Directors: Michele Giordano, Nikolaos Radouniklis, Daniel Wajngarten
RE: private methods in mllib
Posted by Sahm Stephan <s....@reply.de>.
I just want things to get running and sometimes it is the easiest and fastest to adapt the code a bit here and there.
Scala implicits are lovely to do so, however, if things are private there is no way to add extra functionality.
Currently I am just copying the spark classes which I need access to, which looks a bit unwanted.
Stephan Sahm
Data Reply
a Reply AG company
Luise-Ullrich-Straße 14
80636 - München - GERMANY
phone: +49 89 411142-0
mobile: +49 151 19567092
s.sahm@reply.de<ma...@reply.de>
www.reply.de
[Data Reply]
________________________________
From: holden.karau@gmail.com [holden.karau@gmail.com] on behalf of Holden Karau [holden@pigscanfly.ca]
Sent: 01 December 2017 19:33
To: Jacek Laskowski
Cc: Sahm Stephan; dev@spark.apache.org
Subject: Re: private methods in mllib
So I want to be clear, many of these things are private in org.apache.spark.ml<http://org.apache.spark.ml> as well. The implementation details of the algorithms may change, so changing parts of the internals isn't easily supported. What are you trying to change or add?
On Fri, Dec 1, 2017 at 10:29 AM, Jacek Laskowski <ja...@japila.pl>> wrote:
Hi Sahm,
Unless I'm mistaken [1], but org.apache.spark.mllib is put on hold and is considered @deprecated these days. That'd explain why "so many things made private".
[1] https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/package.scala#L21
Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Fri, Dec 1, 2017 at 11:35 AM, Sahm Stephan <s....@reply.de>> wrote:
Dear spark developers,
just tried to modify the MultivariateGaussian from org.apache.spark.mllib.stat.distribution and failed miserably because there are so many things made private.
Why? What would be a good way to deal with this? Rewrite everything?
thanks a lot,
best,
Stephan
Stephan Sahm
Data Reply
a Reply AG company
Luise-Ullrich-Straße 14<https://maps.google.com/?q=Luise-Ullrich-Stra%C3%9Fe+14+%0D+80636+-+M%C3%BCnchen+-+GERMANY&entry=gmail&source=g>
80636 - München - GERMANY
phone: +49 89 411142-0
mobile: +49 151 19567092<tel:+49%201511%209567092>
s.sahm@reply.de<ma...@reply.de>
www.reply.de<http://www.reply.de>
[Data Reply]
________________________________
Data Reply GmbH
Sitz/Registered Office: München
Handelsregister/Register of Companies: Amtsgericht München, HRB 219581
Geschäftsführer/Managing Directors: Michele Giordano, Nikolaos Radouniklis, Daniel Wajngarten
--
Twitter: https://twitter.com/holdenkarau
________________________________
Data Reply GmbH
Sitz/Registered Office: München
Handelsregister/Register of Companies: Amtsgericht München, HRB 219581
Geschäftsführer/Managing Directors: Michele Giordano, Nikolaos Radouniklis, Daniel Wajngarten
Re: private methods in mllib
Posted by Holden Karau <ho...@pigscanfly.ca>.
So I want to be clear, many of these things are private in
org.apache.spark.ml as well. The implementation details of the algorithms
may change, so changing parts of the internals isn't easily supported. What
are you trying to change or add?
On Fri, Dec 1, 2017 at 10:29 AM, Jacek Laskowski <ja...@japila.pl> wrote:
> Hi Sahm,
>
> Unless I'm mistaken [1], but org.apache.spark.mllib is put on hold and is
> considered @deprecated these days. That'd explain why "so many things made
> private".
>
> [1] https://github.com/apache/spark/blob/master/mllib/src/
> main/scala/org/apache/spark/mllib/package.scala#L21
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> On Fri, Dec 1, 2017 at 11:35 AM, Sahm Stephan <s....@reply.de> wrote:
>
>> Dear spark developers,
>>
>> just tried to modify the MultivariateGaussian from
>> org.apache.spark.mllib.stat.distribution and failed miserably because
>> there are so many things made private.
>>
>> Why? What would be a good way to deal with this? Rewrite everything?
>>
>> thanks a lot,
>> best,
>> Stephan
>>
>>
>> Stephan Sahm
>>
>> Data Reply
>> a Reply AG company
>>
>> Luise-Ullrich-Straße 14
>> <https://maps.google.com/?q=Luise-Ullrich-Stra%C3%9Fe+14+%0D+80636+-+M%C3%BCnchen+-+GERMANY&entry=gmail&source=g>
>> 80636 - München - GERMANY
>> phone: +49 89 411142-0
>> mobile: +49 151 19567092 <+49%201511%209567092>
>> s.sahm@reply.de
>> www.reply.de
>>
>> [image: Data Reply]
>>
>> ------------------------------
>>
>> Data Reply GmbH
>> Sitz/Registered Office: München
>> Handelsregister/Register of Companies: Amtsgericht München, HRB 219581
>> Geschäftsführer/Managing Directors: Michele Giordano, Nikolaos
>> Radouniklis, Daniel Wajngarten
>>
>
>
--
Twitter: https://twitter.com/holdenkarau
Re: private methods in mllib
Posted by Jacek Laskowski <ja...@japila.pl>.
Hi Sahm,
Unless I'm mistaken [1], but org.apache.spark.mllib is put on hold and is
considered @deprecated these days. That'd explain why "so many things made
private".
[1]
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/package.scala#L21
Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Fri, Dec 1, 2017 at 11:35 AM, Sahm Stephan <s....@reply.de> wrote:
> Dear spark developers,
>
> just tried to modify the MultivariateGaussian from
> org.apache.spark.mllib.stat.distribution and failed miserably because
> there are so many things made private.
>
> Why? What would be a good way to deal with this? Rewrite everything?
>
> thanks a lot,
> best,
> Stephan
>
>
> Stephan Sahm
>
> Data Reply
> a Reply AG company
>
> Luise-Ullrich-Straße 14
> <https://maps.google.com/?q=Luise-Ullrich-Stra%C3%9Fe+14+%0D+80636+-+M%C3%BCnchen+-+GERMANY&entry=gmail&source=g>
> 80636 - München - GERMANY
> phone: +49 89 411142-0
> mobile: +49 151 19567092 <+49%201511%209567092>
> s.sahm@reply.de
> www.reply.de
>
> [image: Data Reply]
>
> ------------------------------
>
> Data Reply GmbH
> Sitz/Registered Office: München
> Handelsregister/Register of Companies: Amtsgericht München, HRB 219581
> Geschäftsführer/Managing Directors: Michele Giordano, Nikolaos
> Radouniklis, Daniel Wajngarten
>