You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Stephen Boesch <ja...@gmail.com> on 2017/11/22 23:07:45 UTC

Spark.ml roadmap 2.3.0 and beyond

The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:

2.2.0 https://issues.apache.org/jira/browse/SPARK-18813

2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
..

It seems those roadmaps were not available per se' for 2.3.0 and later? Is
there a different mechanism for that info?

stephenb

Re: Spark.ml roadmap 2.3.0 and beyond

Posted by Stephen Boesch <ja...@gmail.com>.

awesome thanks Joseph

2018-03-20 14:51 GMT-07:00 Joseph Bradley <jo...@databricks.com>:

> The promised roadmap JIRA: https://issues.apache.
> org/jira/browse/SPARK-23758
>
> Note it doesn't have much explicitly listed yet, but committers can add
> items as they agree to shepherd them.  (Committers, make sure to check what
> you're currently listed as shepherding!)  The links for searching can be
> useful too.
>
> On Thu, Dec 7, 2017 at 3:55 PM, Stephen Boesch <ja...@gmail.com> wrote:
>
>> Thanks Joseph.  We can wait for post 2.3.0.
>>
>> 2017-12-07 15:36 GMT-08:00 Joseph Bradley <jo...@databricks.com>:
>>
>>> Hi Stephen,
>>>
>>> I used to post those roadmap JIRAs to share instructions for
>>> contributing to MLlib and to try to coordinate amongst committers.  My
>>> feeling was that the coordination aspect was of mixed success, so I did not
>>> post one for 2.3.  I'm glad you pinged about this; if those were useful,
>>> then I can plan on posting one for the release after 2.3.  As far as
>>> identifying committers' plans, the best option right now is to look for
>>> Shepherds in JIRA as well as the few mailing list threads about directions.
>>>
>>> For myself, I'm mainly focusing on fixing some issues with persistence
>>> for custom algorithms in PySpark (done), adding the image schema (done),
>>> and using ML Pipelines in Structured Streaming (WIP).
>>>
>>> Joseph
>>>
>>> On Wed, Nov 29, 2017 at 6:52 AM, Stephen Boesch <ja...@gmail.com>
>>> wrote:
>>>
>>>> There are several  JIRA's and/or PR's that contain logic the Data
>>>> Science teams that I work with use in their local models. We are trying to
>>>> determine if/when these features may gain traction again.  In at least one
>>>> case all of the work were done but the shepherd said that getting it
>>>> committed were of lower priority than other tasks - one specifically
>>>> mentioned was the mllib/ml parity that has been ongoing for nearly three
>>>> years.
>>>>
>>>> In order to prioritize work that the ML platform would do it would be
>>>> helpful to know at least which if any of those tasks were going to be moved
>>>> ahead by the community: since we could then focus on other ones instead of
>>>> duplicating the effort.
>>>>
>>>> In addition there are some engineering code jam sessions that happen
>>>> periodically: knowing which features are actively on the roadmap would *certainly
>>>> *influence our selection of work.  The roadmaps from 2.2.0 and earlier
>>>> were a very good starting point to understand not just the specific work in
>>>> progress - but also the current mindset/thinking of the committers in terms
>>>> of general priorities.
>>>>
>>>> So if the same format of document were not available - then what
>>>> content *is *that gives a picture of where spark.ml were headed?
>>>>
>>>> 2017-11-29 6:39 GMT-08:00 Stephen Boesch <ja...@gmail.com>:
>>>>
>>>>> Any further information/ thoughts?
>>>>>
>>>>>
>>>>>
>>>>> 2017-11-22 15:07 GMT-08:00 Stephen Boesch <ja...@gmail.com>:
>>>>>
>>>>>> The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:
>>>>>>
>>>>>> 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813
>>>>>>
>>>>>> 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
>>>>>> ..
>>>>>>
>>>>>> It seems those roadmaps were not available per se' for 2.3.0 and
>>>>>> later? Is there a different mechanism for that info?
>>>>>>
>>>>>> stephenb
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Joseph Bradley
>>>
>>> Software Engineer - Machine Learning
>>>
>>> Databricks, Inc.
>>>
>>> [image: http://databricks.com] <http://databricks.com/>
>>>
>>
>>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] <http://databricks.com/>
>

Re: Spark.ml roadmap 2.3.0 and beyond

Posted by Joseph Bradley <jo...@databricks.com>.

The promised roadmap JIRA: https://issues.apache.org/jira/browse/SPARK-23758

Note it doesn't have much explicitly listed yet, but committers can add
items as they agree to shepherd them.  (Committers, make sure to check what
you're currently listed as shepherding!)  The links for searching can be
useful too.

On Thu, Dec 7, 2017 at 3:55 PM, Stephen Boesch <ja...@gmail.com> wrote:

> Thanks Joseph.  We can wait for post 2.3.0.
>
> 2017-12-07 15:36 GMT-08:00 Joseph Bradley <jo...@databricks.com>:
>
>> Hi Stephen,
>>
>> I used to post those roadmap JIRAs to share instructions for contributing
>> to MLlib and to try to coordinate amongst committers.  My feeling was that
>> the coordination aspect was of mixed success, so I did not post one for
>> 2.3.  I'm glad you pinged about this; if those were useful, then I can plan
>> on posting one for the release after 2.3.  As far as identifying
>> committers' plans, the best option right now is to look for Shepherds in
>> JIRA as well as the few mailing list threads about directions.
>>
>> For myself, I'm mainly focusing on fixing some issues with persistence
>> for custom algorithms in PySpark (done), adding the image schema (done),
>> and using ML Pipelines in Structured Streaming (WIP).
>>
>> Joseph
>>
>> On Wed, Nov 29, 2017 at 6:52 AM, Stephen Boesch <ja...@gmail.com>
>> wrote:
>>
>>> There are several  JIRA's and/or PR's that contain logic the Data
>>> Science teams that I work with use in their local models. We are trying to
>>> determine if/when these features may gain traction again.  In at least one
>>> case all of the work were done but the shepherd said that getting it
>>> committed were of lower priority than other tasks - one specifically
>>> mentioned was the mllib/ml parity that has been ongoing for nearly three
>>> years.
>>>
>>> In order to prioritize work that the ML platform would do it would be
>>> helpful to know at least which if any of those tasks were going to be moved
>>> ahead by the community: since we could then focus on other ones instead of
>>> duplicating the effort.
>>>
>>> In addition there are some engineering code jam sessions that happen
>>> periodically: knowing which features are actively on the roadmap would *certainly
>>> *influence our selection of work.  The roadmaps from 2.2.0 and earlier
>>> were a very good starting point to understand not just the specific work in
>>> progress - but also the current mindset/thinking of the committers in terms
>>> of general priorities.
>>>
>>> So if the same format of document were not available - then what content *is
>>> *that gives a picture of where spark.ml were headed?
>>>
>>> 2017-11-29 6:39 GMT-08:00 Stephen Boesch <ja...@gmail.com>:
>>>
>>>> Any further information/ thoughts?
>>>>
>>>>
>>>>
>>>> 2017-11-22 15:07 GMT-08:00 Stephen Boesch <ja...@gmail.com>:
>>>>
>>>>> The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:
>>>>>
>>>>> 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813
>>>>>
>>>>> 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
>>>>> ..
>>>>>
>>>>> It seems those roadmaps were not available per se' for 2.3.0 and
>>>>> later? Is there a different mechanism for that info?
>>>>>
>>>>> stephenb
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Joseph Bradley
>>
>> Software Engineer - Machine Learning
>>
>> Databricks, Inc.
>>
>> [image: http://databricks.com] <http://databricks.com/>
>>
>
>


-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] <http://databricks.com/>

Re: Spark.ml roadmap 2.3.0 and beyond

Posted by Stephen Boesch <ja...@gmail.com>.

Thanks Joseph.  We can wait for post 2.3.0.

2017-12-07 15:36 GMT-08:00 Joseph Bradley <jo...@databricks.com>:

> Hi Stephen,
>
> I used to post those roadmap JIRAs to share instructions for contributing
> to MLlib and to try to coordinate amongst committers.  My feeling was that
> the coordination aspect was of mixed success, so I did not post one for
> 2.3.  I'm glad you pinged about this; if those were useful, then I can plan
> on posting one for the release after 2.3.  As far as identifying
> committers' plans, the best option right now is to look for Shepherds in
> JIRA as well as the few mailing list threads about directions.
>
> For myself, I'm mainly focusing on fixing some issues with persistence for
> custom algorithms in PySpark (done), adding the image schema (done), and
> using ML Pipelines in Structured Streaming (WIP).
>
> Joseph
>
> On Wed, Nov 29, 2017 at 6:52 AM, Stephen Boesch <ja...@gmail.com> wrote:
>
>> There are several  JIRA's and/or PR's that contain logic the Data Science
>> teams that I work with use in their local models. We are trying to
>> determine if/when these features may gain traction again.  In at least one
>> case all of the work were done but the shepherd said that getting it
>> committed were of lower priority than other tasks - one specifically
>> mentioned was the mllib/ml parity that has been ongoing for nearly three
>> years.
>>
>> In order to prioritize work that the ML platform would do it would be
>> helpful to know at least which if any of those tasks were going to be moved
>> ahead by the community: since we could then focus on other ones instead of
>> duplicating the effort.
>>
>> In addition there are some engineering code jam sessions that happen
>> periodically: knowing which features are actively on the roadmap would *certainly
>> *influence our selection of work.  The roadmaps from 2.2.0 and earlier
>> were a very good starting point to understand not just the specific work in
>> progress - but also the current mindset/thinking of the committers in terms
>> of general priorities.
>>
>> So if the same format of document were not available - then what content *is
>> *that gives a picture of where spark.ml were headed?
>>
>> 2017-11-29 6:39 GMT-08:00 Stephen Boesch <ja...@gmail.com>:
>>
>>> Any further information/ thoughts?
>>>
>>>
>>>
>>> 2017-11-22 15:07 GMT-08:00 Stephen Boesch <ja...@gmail.com>:
>>>
>>>> The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:
>>>>
>>>> 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813
>>>>
>>>> 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
>>>> ..
>>>>
>>>> It seems those roadmaps were not available per se' for 2.3.0 and later?
>>>> Is there a different mechanism for that info?
>>>>
>>>> stephenb
>>>>
>>>
>>>
>>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] <http://databricks.com/>
>

Re: Spark.ml roadmap 2.3.0 and beyond

Posted by Joseph Bradley <jo...@databricks.com>.

Hi Stephen,

I used to post those roadmap JIRAs to share instructions for contributing
to MLlib and to try to coordinate amongst committers.  My feeling was that
the coordination aspect was of mixed success, so I did not post one for
2.3.  I'm glad you pinged about this; if those were useful, then I can plan
on posting one for the release after 2.3.  As far as identifying
committers' plans, the best option right now is to look for Shepherds in
JIRA as well as the few mailing list threads about directions.

For myself, I'm mainly focusing on fixing some issues with persistence for
custom algorithms in PySpark (done), adding the image schema (done), and
using ML Pipelines in Structured Streaming (WIP).

Joseph

On Wed, Nov 29, 2017 at 6:52 AM, Stephen Boesch <ja...@gmail.com> wrote:

> There are several  JIRA's and/or PR's that contain logic the Data Science
> teams that I work with use in their local models. We are trying to
> determine if/when these features may gain traction again.  In at least one
> case all of the work were done but the shepherd said that getting it
> committed were of lower priority than other tasks - one specifically
> mentioned was the mllib/ml parity that has been ongoing for nearly three
> years.
>
> In order to prioritize work that the ML platform would do it would be
> helpful to know at least which if any of those tasks were going to be moved
> ahead by the community: since we could then focus on other ones instead of
> duplicating the effort.
>
> In addition there are some engineering code jam sessions that happen
> periodically: knowing which features are actively on the roadmap would *certainly
> *influence our selection of work.  The roadmaps from 2.2.0 and earlier
> were a very good starting point to understand not just the specific work in
> progress - but also the current mindset/thinking of the committers in terms
> of general priorities.
>
> So if the same format of document were not available - then what content *is
> *that gives a picture of where spark.ml were headed?
>
> 2017-11-29 6:39 GMT-08:00 Stephen Boesch <ja...@gmail.com>:
>
>> Any further information/ thoughts?
>>
>>
>>
>> 2017-11-22 15:07 GMT-08:00 Stephen Boesch <ja...@gmail.com>:
>>
>>> The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:
>>>
>>> 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813
>>>
>>> 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
>>> ..
>>>
>>> It seems those roadmaps were not available per se' for 2.3.0 and later?
>>> Is there a different mechanism for that info?
>>>
>>> stephenb
>>>
>>
>>
>


-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] <http://databricks.com/>

Re: Spark.ml roadmap 2.3.0 and beyond

Posted by Stephen Boesch <ja...@gmail.com>.

There are several  JIRA's and/or PR's that contain logic the Data Science
teams that I work with use in their local models. We are trying to
determine if/when these features may gain traction again.  In at least one
case all of the work were done but the shepherd said that getting it
committed were of lower priority than other tasks - one specifically
mentioned was the mllib/ml parity that has been ongoing for nearly three
years.

In order to prioritize work that the ML platform would do it would be
helpful to know at least which if any of those tasks were going to be moved
ahead by the community: since we could then focus on other ones instead of
duplicating the effort.

In addition there are some engineering code jam sessions that happen
periodically: knowing which features are actively on the roadmap would
*certainly
*influence our selection of work.  The roadmaps from 2.2.0 and earlier were
a very good starting point to understand not just the specific work in
progress - but also the current mindset/thinking of the committers in terms
of general priorities.

So if the same format of document were not available - then what content *is
*that gives a picture of where spark.ml were headed?

2017-11-29 6:39 GMT-08:00 Stephen Boesch <ja...@gmail.com>:

> Any further information/ thoughts?
>
>
>
> 2017-11-22 15:07 GMT-08:00 Stephen Boesch <ja...@gmail.com>:
>
>> The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:
>>
>> 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813
>>
>> 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
>> ..
>>
>> It seems those roadmaps were not available per se' for 2.3.0 and later?
>> Is there a different mechanism for that info?
>>
>> stephenb
>>
>
>

Re: Spark.ml roadmap 2.3.0 and beyond

Posted by Stephen Boesch <ja...@gmail.com>.

Any further information/ thoughts?



2017-11-22 15:07 GMT-08:00 Stephen Boesch <ja...@gmail.com>:

> The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:
>
> 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813
>
> 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
> ..
>
> It seems those roadmaps were not available per se' for 2.3.0 and later? Is
> there a different mechanism for that info?
>
> stephenb
>