You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Matthias Boehm <mb...@googlemail.com> on 2017/01/03 19:50:31 UTC

[DISCUSS] Roadmap SystemML 1.0

I'd like to initiate the discussion of a concrete roadmap for our next 
release. According, to previous discussions, I'd think it's fair to say 
that we agree on calling it SystemML 1.0. We should carefully plan this 
release as it's an opportunity to change APIs and remove some older 
deprecated features. I'd like to encourage not just developers but also 
the broader community to participate in this discussion.

Personally, I think a target date of Q2/2017 is realistic. Let's start 
with collecting the major features and changes that potentially affect 
users. Here is an initial list, but please feel free to add and up- or 
down-vote the individual items.

1) APIs and Language:
* Cleanup new MLContext (matrix/frame data types, move tests, etc)
* Remove old MLContext
* Consolidate MLContext and JMLC
* Full support for Scala/Python DSLs
* Remove old file-based transform
* Scala/Python wrappers for all existing algorithms
* Data converters (additional formats: e.g., libsvm; performance)

2) Updated Dependencies:
* Spark 2.0 support
* Matrix block library (isolated jar)

3) Compiler/Runtime Features:
* GPU support (full compiler and runtime support)
* Compressed linear algebra v2
* Code generation (automatic operator fusion)
* Extended parfor (full spark exploitation, micro-batch support)
* Scale-up architecture (large dense blocks, numa)?

4) Tools
* Extended stats (task locality, shuffle, etc)
* Cloud resource advisor (extended resource optimizer)?

5) Algorithms
* Graduate "staging" algorithms (robustness/performance)
* Perftest: include all algorithms into automated performance tests
* Simplify usage decision trees, random forest, mlogreg, msvm 
(preprocessing, label representation, etc)

Items marked with a ? can potentially be moved out to subsequent releases.


Regards,
Matthias

Re: [DISCUSS] Roadmap SystemML 1.0

Posted by Matthias Boehm <mb...@googlemail.com>.
In order to make this roadmap more concrete, I created the following epics
for the target release 1.0 with about 50 subtasks, and linked related
existing issues. Given the discussion on a short release cycle, the bare
minimum would be SYSTEMML-1299 (which includes all changes that affect the
external behavior), and a subset of SYSTEMML-1308 (especially features that
address proper cleanups and robustness against OOMs).

SYSTEMML-1299 Language feature updates
SYSTEMML-1321 Compiler feature extensions
SYSTEMML-1308 Runtime feature extensions
SYSTEMML-1284 Code generation for operator fusion
SYSTEMML-1328 Perftest extensions

I did not touch GPUs, Deep Learning, DSLs, and algorithms yet. So please
have a look, and update or create them if necessary.


Regards,
Matthias


On Mon, Jan 16, 2017 at 8:14 PM, <du...@gmail.com> wrote:

> Yeah using the target release would be good. Actually, with that in mind,
> I believe that we have been marking closed issues since the 0.11 release as
> targeting an upcoming "1.0" release, but it would probably be more correct
> to update those to "0.12" since we decided to release 0.12. In addition, we
> should set the target of the Spark 2.x support issue to "0.13".
>
> As for the roadmap, it would be good to update the website with a
> high-level overview, with links to associated JIRA issues.
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Jan 16, 2017, at 7:35 PM, Luciano Resende <lu...@gmail.com>
> wrote:
> >
> > Instead of Epic, we could use the target release ? Also, we have a
> roadmap
> > page on the site and we should keep that up to date, or get rid of that
> and
> > use roadmap on jira.
> >
> >> On Mon, Jan 16, 2017 at 6:20 PM <du...@gmail.com> wrote:
> >>
> >> Now that we've had some discussion here, it would be good to transfer
> this
> >> discussion into a JIRA epic, containing sub tasks. That way, we can
> >> properly track our progress on these items and facilitate contributions
> >> from the community.  Note that some of the sub tasks may already exist
> as
> >> individual issues.
> >>
> >>
> >>
> >> Would anyone in the community like to volunteer for creating these
> issues?
> >>
> >>
> >>
> >> - Mike
> >>
> >>
> >>
> >> --
> >>
> >>
> >>
> >> Mike Dusenberry
> >>
> >> GitHub: github.com/dusenberrymw
> >>
> >> LinkedIn: linkedin.com/in/mikedusenberry
> >>
> >>
> >>
> >> Sent from my iPhone.
> >>
> >>
> >>
> >>
> >>
> >>>> On Jan 4, 2017, at 6:00 PM, dusenberrymw@gmail.com wrote:
> >>>
> >>>
> >>
> >>> Overall, this is a good list of items that should be worked on,
> >> particularly because it contains several user-facing items.  However, to
> >> echo what Luciano said, I'm also concerned about the timeline.  At this
> >> stage, I agree that we need to release more often, and with a more
> >> user-oriented "product" focus as a guide for timelines.  I.e. we should
> >> orient our release timelines around items that focus on the "product" of
> >> allowing the user to work on a wide range of ML problems in a simple and
> >> easy manner on top of Spark.
> >>
> >>>
> >>
> >>> With that in mind, I agree that a focus on a subset of (1) and (2)
> would
> >> be good for an immediate release, with a particular focus on Spark 2.0
> >> support as a priority.
> >>
> >>>
> >>
> >>> How about we aim for a February 1st release date for the initial items?
> >>
> >>>
> >>
> >>> -Mike
> >>
> >>>
> >>
> >>> --
> >>
> >>>
> >>
> >>> Mike Dusenberry
> >>
> >>> GitHub: github.com/dusenberrymw
> >>
> >>> LinkedIn: linkedin.com/in/mikedusenberry
> >>
> >>>
> >>
> >>> Sent from my iPhone.
> >>
> >>>
> >>
> >>>
> >>
> >>>> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <np...@us.ibm.com>
> wrote:
> >>
> >>>>
> >>
> >>>> Hi Matthias,
> >>
> >>>>
> >>
> >>>> Thanks for the detailed roadmap.
> >>
> >>>>
> >>
> >>>> +1 for all the items with few modifications.
> >>
> >>>>
> >>
> >>>> 1) APIs and Language:
> >>
> >>>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
> >>
> >>>>>> Ensure Python and Scala MLContext have same API capability.
> >>
> >>>>
> >>
> >>>> * Remove old MLContext
> >>
> >>>> * Consolidate MLContext and JMLC
> >>
> >>>> * Full support for Scala/Python DSLs
> >>
> >>>>>> +1 for Python DSL except for push-down of loop structures and
> >> functions.
> >>
> >>>>
> >>
> >>>> * Remove old file-based transform
> >>
> >>>> * Scala/Python wrappers for all existing algorithms
> >>
> >>>> * Data converters (additional formats: e.g., libsvm; performance)
> >>
> >>>>
> >>
> >>>> 2) Updated Dependencies:
> >>
> >>>> * Spark 2.0 support
> >>
> >>>> * Matrix block library (isolated jar)
> >>
> >>>>
> >>
> >>>> 3) Compiler/Runtime Features:
> >>
> >>>> * GPU support (full compiler and runtime support)
> >>
> >>>>>> Can we break this down into phases:
> >> https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the
> >> timeline of the phases in the JIRA.
> >>
> >>>>
> >>
> >>>> * Compressed linear algebra v2
> >>
> >>>> * Code generation (automatic operator fusion)
> >>
> >>>> * Extended parfor (full spark exploitation, micro-batch support)
> >>
> >>>> * Scale-up architecture (large dense blocks, numa)?
> >>
> >>>>
> >>
> >>>> 4) Tools
> >>
> >>>> * Extended stats (task locality, shuffle, etc)
> >>
> >>>> * Cloud resource advisor (extended resource optimizer)?
> >>
> >>>>
> >>
> >>>> 5) Algorithms
> >>
> >>>> * Graduate "staging" algorithms (robustness/performance)
> >>
> >>>> * Perftest: include all algorithms into automated performance tests
> >>
> >>>>>> via spark-submit + via Scala/Python wrappers
> >>
> >>>>
> >>
> >>>> * Simplify usage decision trees, random forest, mlogreg, msvm
> >>
> >>>> (preprocessing, label representation, etc)
> >>
> >>>>>> + command-line variable naming. For example: maxi, maxiter, etc.
> >>
> >>>>
> >>
> >>>> Thanks,
> >>
> >>>>
> >>
> >>>> Niketan Pansare
> >>
> >>>> IBM Almaden Research Center
> >>
> >>>> E-mail: npansar At us.ibm.com
> >>
> >>>> http://researcher.watson.ibm.com/researcher/view.php?
> person=us-npansar
> >>
> >>>>
> >>
> >>>> Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and
> >> (4) can be done incrementally. For (5), some of the changes might also
> >>
> >>>>
> >>
> >>>> From: Matthias Boehm <mb...@googlemail.com>
> >>
> >>>> To: dev@systemml.incubator.apache.org
> >>
> >>>> Date: 01/03/2017 02:44 PM
> >>
> >>>> Subject: Re: [DISCUSS] Roadmap SystemML 1.0
> >>
> >>>>
> >>
> >>>>
> >>
> >>>>
> >>
> >>>>
> >>
> >>>> Yes indeed, most of (3) and (4) can be done incrementally. For (5),
> some
> >>
> >>>> of the changes might also modify the signature of algorithms (i.e.,
> >>
> >>>> parameters and required input data) but it would help, for example
> with
> >>
> >>>> decision trees, as users no longer need to dummy code their inputs.
> >>
> >>>>
> >>
> >>>> Generally, I'm fine with making (3), (4), and part of (5) optional and
> >>
> >>>> let the "must-have" features from (1) and (2) determine the timeline.
> >>
> >>>>
> >>
> >>>> Regards,
> >>
> >>>> Matthias
> >>
> >>>>
> >>
> >>>> On 1/3/2017 11:27 PM, Luciano Resende wrote:
> >>
> >>>>> On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <
> >> mboehm7@googlemail.com>
> >>
> >>>>> wrote:
> >>
> >>>>>
> >>
> >>>>>> I'd like to initiate the discussion of a concrete roadmap for our
> >> next
> >>
> >>>>>> release. According, to previous discussions, I'd think it's fair to
> >> say
> >>
> >>>>>> that we agree on calling it SystemML 1.0. We should carefully plan
> >> this
> >>
> >>>>>> release as it's an opportunity to change APIs and remove some older
> >>
> >>>>>> deprecated features. I'd like to encourage not just developers but
> >> also the
> >>
> >>>>>> broader community to participate in this discussion.
> >>
> >>>>>>
> >>
> >>>>>> Personally, I think a target date of Q2/2017 is realistic. Let's
> >> start
> >>
> >>>>>> with collecting the major features and changes that potentially
> >> affect
> >>
> >>>>>> users. Here is an initial list, but please feel free to add and up-
> >> or
> >>
> >>>>>> down-vote the individual items.
> >>
> >>>>>>
> >>
> >>>>>> 1) APIs and Language:
> >>
> >>>>>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
> >>
> >>>>>> * Remove old MLContext
> >>
> >>>>>> * Consolidate MLContext and JMLC
> >>
> >>>>>> * Full support for Scala/Python DSLs
> >>
> >>>>>> * Remove old file-based transform
> >>
> >>>>>> * Scala/Python wrappers for all existing algorithms
> >>
> >>>>>> * Data converters (additional formats: e.g., libsvm; performance)
> >>
> >>>>>>
> >>
> >>>>>> 2) Updated Dependencies:
> >>
> >>>>>> * Spark 2.0 support
> >>
> >>>>>> * Matrix block library (isolated jar)
> >>
> >>>>>>
> >>
> >>>>>> 3) Compiler/Runtime Features:
> >>
> >>>>>> * GPU support (full compiler and runtime support)
> >>
> >>>>>> * Compressed linear algebra v2
> >>
> >>>>>> * Code generation (automatic operator fusion)
> >>
> >>>>>> * Extended parfor (full spark exploitation, micro-batch support)
> >>
> >>>>>> * Scale-up architecture (large dense blocks, numa)?
> >>
> >>>>>>
> >>
> >>>>>> 4) Tools
> >>
> >>>>>> * Extended stats (task locality, shuffle, etc)
> >>
> >>>>>> * Cloud resource advisor (extended resource optimizer)?
> >>
> >>>>>>
> >>
> >>>>>> 5) Algorithms
> >>
> >>>>>> * Graduate "staging" algorithms (robustness/performance)
> >>
> >>>>>> * Perftest: include all algorithms into automated performance tests
> >>
> >>>>>> * Simplify usage decision trees, random forest, mlogreg, msvm
> >>
> >>>>>> (preprocessing, label representation, etc)
> >>
> >>>>>>
> >>
> >>>>>> Items marked with a ? can potentially be moved out to subsequent
> >> releases.
> >>
> >>>>>>
> >>
> >>>>>>
> >>
> >>>>>> Regards,
> >>
> >>>>>> Matthias
> >>
> >>>>>>
> >>
> >>>>>
> >>
> >>>>> My understanding is that most of the items in 1 and 2 are going to
> >> break
> >>
> >>>>> backward compatibility, while the others can be done incrementally.
> >> Is this
> >>
> >>>>> assumption correct? If so, can we finish 1 and 2 and do a 1.0
> >> release. and
> >>
> >>>>> them, continue with 3, 4, 5, etc ? as I don't think we should wait
> for
> >>
> >>>>> 2017/Q2 to do a 1.0 release. I believe in release early, release
> >> often,
> >>
> >>>>> particularly to attract new users, that can help verifying and
> >> contributing
> >>
> >>>>> to specific releases.
> >>
> >>>>>
> >>
> >>>>> Thoughts ?
> >>
> >>>>>
> >>
> >>>>
> >>
> >>>>
> >>
> >>>>
> >>
> >>>>
> >>
> >> --
> > Sent from my Mobile device
>

Re: [DISCUSS] Roadmap SystemML 1.0

Posted by du...@gmail.com.
Yeah using the target release would be good. Actually, with that in mind, I believe that we have been marking closed issues since the 0.11 release as targeting an upcoming "1.0" release, but it would probably be more correct to update those to "0.12" since we decided to release 0.12. In addition, we should set the target of the Spark 2.x support issue to "0.13".

As for the roadmap, it would be good to update the website with a high-level overview, with links to associated JIRA issues.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 16, 2017, at 7:35 PM, Luciano Resende <lu...@gmail.com> wrote:
> 
> Instead of Epic, we could use the target release ? Also, we have a roadmap
> page on the site and we should keep that up to date, or get rid of that and
> use roadmap on jira.
> 
>> On Mon, Jan 16, 2017 at 6:20 PM <du...@gmail.com> wrote:
>> 
>> Now that we've had some discussion here, it would be good to transfer this
>> discussion into a JIRA epic, containing sub tasks. That way, we can
>> properly track our progress on these items and facilitate contributions
>> from the community.  Note that some of the sub tasks may already exist as
>> individual issues.
>> 
>> 
>> 
>> Would anyone in the community like to volunteer for creating these issues?
>> 
>> 
>> 
>> - Mike
>> 
>> 
>> 
>> --
>> 
>> 
>> 
>> Mike Dusenberry
>> 
>> GitHub: github.com/dusenberrymw
>> 
>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>> 
>> 
>> Sent from my iPhone.
>> 
>> 
>> 
>> 
>> 
>>>> On Jan 4, 2017, at 6:00 PM, dusenberrymw@gmail.com wrote:
>>> 
>>> 
>> 
>>> Overall, this is a good list of items that should be worked on,
>> particularly because it contains several user-facing items.  However, to
>> echo what Luciano said, I'm also concerned about the timeline.  At this
>> stage, I agree that we need to release more often, and with a more
>> user-oriented "product" focus as a guide for timelines.  I.e. we should
>> orient our release timelines around items that focus on the "product" of
>> allowing the user to work on a wide range of ML problems in a simple and
>> easy manner on top of Spark.
>> 
>>> 
>> 
>>> With that in mind, I agree that a focus on a subset of (1) and (2) would
>> be good for an immediate release, with a particular focus on Spark 2.0
>> support as a priority.
>> 
>>> 
>> 
>>> How about we aim for a February 1st release date for the initial items?
>> 
>>> 
>> 
>>> -Mike
>> 
>>> 
>> 
>>> --
>> 
>>> 
>> 
>>> Mike Dusenberry
>> 
>>> GitHub: github.com/dusenberrymw
>> 
>>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>>> 
>> 
>>> Sent from my iPhone.
>> 
>>> 
>> 
>>> 
>> 
>>>> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <np...@us.ibm.com> wrote:
>> 
>>>> 
>> 
>>>> Hi Matthias,
>> 
>>>> 
>> 
>>>> Thanks for the detailed roadmap.
>> 
>>>> 
>> 
>>>> +1 for all the items with few modifications.
>> 
>>>> 
>> 
>>>> 1) APIs and Language:
>> 
>>>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> 
>>>>>> Ensure Python and Scala MLContext have same API capability.
>> 
>>>> 
>> 
>>>> * Remove old MLContext
>> 
>>>> * Consolidate MLContext and JMLC
>> 
>>>> * Full support for Scala/Python DSLs
>> 
>>>>>> +1 for Python DSL except for push-down of loop structures and
>> functions.
>> 
>>>> 
>> 
>>>> * Remove old file-based transform
>> 
>>>> * Scala/Python wrappers for all existing algorithms
>> 
>>>> * Data converters (additional formats: e.g., libsvm; performance)
>> 
>>>> 
>> 
>>>> 2) Updated Dependencies:
>> 
>>>> * Spark 2.0 support
>> 
>>>> * Matrix block library (isolated jar)
>> 
>>>> 
>> 
>>>> 3) Compiler/Runtime Features:
>> 
>>>> * GPU support (full compiler and runtime support)
>> 
>>>>>> Can we break this down into phases:
>> https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the
>> timeline of the phases in the JIRA.
>> 
>>>> 
>> 
>>>> * Compressed linear algebra v2
>> 
>>>> * Code generation (automatic operator fusion)
>> 
>>>> * Extended parfor (full spark exploitation, micro-batch support)
>> 
>>>> * Scale-up architecture (large dense blocks, numa)?
>> 
>>>> 
>> 
>>>> 4) Tools
>> 
>>>> * Extended stats (task locality, shuffle, etc)
>> 
>>>> * Cloud resource advisor (extended resource optimizer)?
>> 
>>>> 
>> 
>>>> 5) Algorithms
>> 
>>>> * Graduate "staging" algorithms (robustness/performance)
>> 
>>>> * Perftest: include all algorithms into automated performance tests
>> 
>>>>>> via spark-submit + via Scala/Python wrappers
>> 
>>>> 
>> 
>>>> * Simplify usage decision trees, random forest, mlogreg, msvm
>> 
>>>> (preprocessing, label representation, etc)
>> 
>>>>>> + command-line variable naming. For example: maxi, maxiter, etc.
>> 
>>>> 
>> 
>>>> Thanks,
>> 
>>>> 
>> 
>>>> Niketan Pansare
>> 
>>>> IBM Almaden Research Center
>> 
>>>> E-mail: npansar At us.ibm.com
>> 
>>>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>> 
>>>> 
>> 
>>>> Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and
>> (4) can be done incrementally. For (5), some of the changes might also
>> 
>>>> 
>> 
>>>> From: Matthias Boehm <mb...@googlemail.com>
>> 
>>>> To: dev@systemml.incubator.apache.org
>> 
>>>> Date: 01/03/2017 02:44 PM
>> 
>>>> Subject: Re: [DISCUSS] Roadmap SystemML 1.0
>> 
>>>> 
>> 
>>>> 
>> 
>>>> 
>> 
>>>> 
>> 
>>>> Yes indeed, most of (3) and (4) can be done incrementally. For (5), some
>> 
>>>> of the changes might also modify the signature of algorithms (i.e.,
>> 
>>>> parameters and required input data) but it would help, for example with
>> 
>>>> decision trees, as users no longer need to dummy code their inputs.
>> 
>>>> 
>> 
>>>> Generally, I'm fine with making (3), (4), and part of (5) optional and
>> 
>>>> let the "must-have" features from (1) and (2) determine the timeline.
>> 
>>>> 
>> 
>>>> Regards,
>> 
>>>> Matthias
>> 
>>>> 
>> 
>>>> On 1/3/2017 11:27 PM, Luciano Resende wrote:
>> 
>>>>> On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <
>> mboehm7@googlemail.com>
>> 
>>>>> wrote:
>> 
>>>>> 
>> 
>>>>>> I'd like to initiate the discussion of a concrete roadmap for our
>> next
>> 
>>>>>> release. According, to previous discussions, I'd think it's fair to
>> say
>> 
>>>>>> that we agree on calling it SystemML 1.0. We should carefully plan
>> this
>> 
>>>>>> release as it's an opportunity to change APIs and remove some older
>> 
>>>>>> deprecated features. I'd like to encourage not just developers but
>> also the
>> 
>>>>>> broader community to participate in this discussion.
>> 
>>>>>> 
>> 
>>>>>> Personally, I think a target date of Q2/2017 is realistic. Let's
>> start
>> 
>>>>>> with collecting the major features and changes that potentially
>> affect
>> 
>>>>>> users. Here is an initial list, but please feel free to add and up-
>> or
>> 
>>>>>> down-vote the individual items.
>> 
>>>>>> 
>> 
>>>>>> 1) APIs and Language:
>> 
>>>>>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> 
>>>>>> * Remove old MLContext
>> 
>>>>>> * Consolidate MLContext and JMLC
>> 
>>>>>> * Full support for Scala/Python DSLs
>> 
>>>>>> * Remove old file-based transform
>> 
>>>>>> * Scala/Python wrappers for all existing algorithms
>> 
>>>>>> * Data converters (additional formats: e.g., libsvm; performance)
>> 
>>>>>> 
>> 
>>>>>> 2) Updated Dependencies:
>> 
>>>>>> * Spark 2.0 support
>> 
>>>>>> * Matrix block library (isolated jar)
>> 
>>>>>> 
>> 
>>>>>> 3) Compiler/Runtime Features:
>> 
>>>>>> * GPU support (full compiler and runtime support)
>> 
>>>>>> * Compressed linear algebra v2
>> 
>>>>>> * Code generation (automatic operator fusion)
>> 
>>>>>> * Extended parfor (full spark exploitation, micro-batch support)
>> 
>>>>>> * Scale-up architecture (large dense blocks, numa)?
>> 
>>>>>> 
>> 
>>>>>> 4) Tools
>> 
>>>>>> * Extended stats (task locality, shuffle, etc)
>> 
>>>>>> * Cloud resource advisor (extended resource optimizer)?
>> 
>>>>>> 
>> 
>>>>>> 5) Algorithms
>> 
>>>>>> * Graduate "staging" algorithms (robustness/performance)
>> 
>>>>>> * Perftest: include all algorithms into automated performance tests
>> 
>>>>>> * Simplify usage decision trees, random forest, mlogreg, msvm
>> 
>>>>>> (preprocessing, label representation, etc)
>> 
>>>>>> 
>> 
>>>>>> Items marked with a ? can potentially be moved out to subsequent
>> releases.
>> 
>>>>>> 
>> 
>>>>>> 
>> 
>>>>>> Regards,
>> 
>>>>>> Matthias
>> 
>>>>>> 
>> 
>>>>> 
>> 
>>>>> My understanding is that most of the items in 1 and 2 are going to
>> break
>> 
>>>>> backward compatibility, while the others can be done incrementally.
>> Is this
>> 
>>>>> assumption correct? If so, can we finish 1 and 2 and do a 1.0
>> release. and
>> 
>>>>> them, continue with 3, 4, 5, etc ? as I don't think we should wait for
>> 
>>>>> 2017/Q2 to do a 1.0 release. I believe in release early, release
>> often,
>> 
>>>>> particularly to attract new users, that can help verifying and
>> contributing
>> 
>>>>> to specific releases.
>> 
>>>>> 
>> 
>>>>> Thoughts ?
>> 
>>>>> 
>> 
>>>> 
>> 
>>>> 
>> 
>>>> 
>> 
>>>> 
>> 
>> --
> Sent from my Mobile device

Re: [DISCUSS] Roadmap SystemML 1.0

Posted by Luciano Resende <lu...@gmail.com>.
Instead of Epic, we could use the target release ? Also, we have a roadmap
page on the site and we should keep that up to date, or get rid of that and
use roadmap on jira.

On Mon, Jan 16, 2017 at 6:20 PM <du...@gmail.com> wrote:

> Now that we've had some discussion here, it would be good to transfer this
> discussion into a JIRA epic, containing sub tasks. That way, we can
> properly track our progress on these items and facilitate contributions
> from the community.  Note that some of the sub tasks may already exist as
> individual issues.
>
>
>
> Would anyone in the community like to volunteer for creating these issues?
>
>
>
> - Mike
>
>
>
> --
>
>
>
> Mike Dusenberry
>
> GitHub: github.com/dusenberrymw
>
> LinkedIn: linkedin.com/in/mikedusenberry
>
>
>
> Sent from my iPhone.
>
>
>
>
>
> > On Jan 4, 2017, at 6:00 PM, dusenberrymw@gmail.com wrote:
>
> >
>
> > Overall, this is a good list of items that should be worked on,
> particularly because it contains several user-facing items.  However, to
> echo what Luciano said, I'm also concerned about the timeline.  At this
> stage, I agree that we need to release more often, and with a more
> user-oriented "product" focus as a guide for timelines.  I.e. we should
> orient our release timelines around items that focus on the "product" of
> allowing the user to work on a wide range of ML problems in a simple and
> easy manner on top of Spark.
>
> >
>
> > With that in mind, I agree that a focus on a subset of (1) and (2) would
> be good for an immediate release, with a particular focus on Spark 2.0
> support as a priority.
>
> >
>
> > How about we aim for a February 1st release date for the initial items?
>
> >
>
> > -Mike
>
> >
>
> > --
>
> >
>
> > Mike Dusenberry
>
> > GitHub: github.com/dusenberrymw
>
> > LinkedIn: linkedin.com/in/mikedusenberry
>
> >
>
> > Sent from my iPhone.
>
> >
>
> >
>
> >> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <np...@us.ibm.com> wrote:
>
> >>
>
> >> Hi Matthias,
>
> >>
>
> >> Thanks for the detailed roadmap.
>
> >>
>
> >> +1 for all the items with few modifications.
>
> >>
>
> >> 1) APIs and Language:
>
> >> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>
> >> >> Ensure Python and Scala MLContext have same API capability.
>
> >>
>
> >> * Remove old MLContext
>
> >> * Consolidate MLContext and JMLC
>
> >> * Full support for Scala/Python DSLs
>
> >> >> +1 for Python DSL except for push-down of loop structures and
> functions.
>
> >>
>
> >> * Remove old file-based transform
>
> >> * Scala/Python wrappers for all existing algorithms
>
> >> * Data converters (additional formats: e.g., libsvm; performance)
>
> >>
>
> >> 2) Updated Dependencies:
>
> >> * Spark 2.0 support
>
> >> * Matrix block library (isolated jar)
>
> >>
>
> >> 3) Compiler/Runtime Features:
>
> >> * GPU support (full compiler and runtime support)
>
> >> >> Can we break this down into phases:
> https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the
> timeline of the phases in the JIRA.
>
> >>
>
> >> * Compressed linear algebra v2
>
> >> * Code generation (automatic operator fusion)
>
> >> * Extended parfor (full spark exploitation, micro-batch support)
>
> >> * Scale-up architecture (large dense blocks, numa)?
>
> >>
>
> >> 4) Tools
>
> >> * Extended stats (task locality, shuffle, etc)
>
> >> * Cloud resource advisor (extended resource optimizer)?
>
> >>
>
> >> 5) Algorithms
>
> >> * Graduate "staging" algorithms (robustness/performance)
>
> >> * Perftest: include all algorithms into automated performance tests
>
> >> >> via spark-submit + via Scala/Python wrappers
>
> >>
>
> >> * Simplify usage decision trees, random forest, mlogreg, msvm
>
> >> (preprocessing, label representation, etc)
>
> >> >> + command-line variable naming. For example: maxi, maxiter, etc.
>
> >>
>
> >> Thanks,
>
> >>
>
> >> Niketan Pansare
>
> >> IBM Almaden Research Center
>
> >> E-mail: npansar At us.ibm.com
>
> >> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> >>
>
> >> Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and
> (4) can be done incrementally. For (5), some of the changes might also
>
> >>
>
> >> From: Matthias Boehm <mb...@googlemail.com>
>
> >> To: dev@systemml.incubator.apache.org
>
> >> Date: 01/03/2017 02:44 PM
>
> >> Subject: Re: [DISCUSS] Roadmap SystemML 1.0
>
> >>
>
> >>
>
> >>
>
> >>
>
> >> Yes indeed, most of (3) and (4) can be done incrementally. For (5), some
>
> >> of the changes might also modify the signature of algorithms (i.e.,
>
> >> parameters and required input data) but it would help, for example with
>
> >> decision trees, as users no longer need to dummy code their inputs.
>
> >>
>
> >> Generally, I'm fine with making (3), (4), and part of (5) optional and
>
> >> let the "must-have" features from (1) and (2) determine the timeline.
>
> >>
>
> >> Regards,
>
> >> Matthias
>
> >>
>
> >> On 1/3/2017 11:27 PM, Luciano Resende wrote:
>
> >> > On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <
> mboehm7@googlemail.com>
>
> >> > wrote:
>
> >> >
>
> >> >> I'd like to initiate the discussion of a concrete roadmap for our
> next
>
> >> >> release. According, to previous discussions, I'd think it's fair to
> say
>
> >> >> that we agree on calling it SystemML 1.0. We should carefully plan
> this
>
> >> >> release as it's an opportunity to change APIs and remove some older
>
> >> >> deprecated features. I'd like to encourage not just developers but
> also the
>
> >> >> broader community to participate in this discussion.
>
> >> >>
>
> >> >> Personally, I think a target date of Q2/2017 is realistic. Let's
> start
>
> >> >> with collecting the major features and changes that potentially
> affect
>
> >> >> users. Here is an initial list, but please feel free to add and up-
> or
>
> >> >> down-vote the individual items.
>
> >> >>
>
> >> >> 1) APIs and Language:
>
> >> >> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>
> >> >> * Remove old MLContext
>
> >> >> * Consolidate MLContext and JMLC
>
> >> >> * Full support for Scala/Python DSLs
>
> >> >> * Remove old file-based transform
>
> >> >> * Scala/Python wrappers for all existing algorithms
>
> >> >> * Data converters (additional formats: e.g., libsvm; performance)
>
> >> >>
>
> >> >> 2) Updated Dependencies:
>
> >> >> * Spark 2.0 support
>
> >> >> * Matrix block library (isolated jar)
>
> >> >>
>
> >> >> 3) Compiler/Runtime Features:
>
> >> >> * GPU support (full compiler and runtime support)
>
> >> >> * Compressed linear algebra v2
>
> >> >> * Code generation (automatic operator fusion)
>
> >> >> * Extended parfor (full spark exploitation, micro-batch support)
>
> >> >> * Scale-up architecture (large dense blocks, numa)?
>
> >> >>
>
> >> >> 4) Tools
>
> >> >> * Extended stats (task locality, shuffle, etc)
>
> >> >> * Cloud resource advisor (extended resource optimizer)?
>
> >> >>
>
> >> >> 5) Algorithms
>
> >> >> * Graduate "staging" algorithms (robustness/performance)
>
> >> >> * Perftest: include all algorithms into automated performance tests
>
> >> >> * Simplify usage decision trees, random forest, mlogreg, msvm
>
> >> >> (preprocessing, label representation, etc)
>
> >> >>
>
> >> >> Items marked with a ? can potentially be moved out to subsequent
> releases.
>
> >> >>
>
> >> >>
>
> >> >> Regards,
>
> >> >> Matthias
>
> >> >>
>
> >> >
>
> >> > My understanding is that most of the items in 1 and 2 are going to
> break
>
> >> > backward compatibility, while the others can be done incrementally.
> Is this
>
> >> > assumption correct? If so, can we finish 1 and 2 and do a 1.0
> release. and
>
> >> > them, continue with 3, 4, 5, etc ? as I don't think we should wait for
>
> >> > 2017/Q2 to do a 1.0 release. I believe in release early, release
> often,
>
> >> > particularly to attract new users, that can help verifying and
> contributing
>
> >> > to specific releases.
>
> >> >
>
> >> > Thoughts ?
>
> >> >
>
> >>
>
> >>
>
> >>
>
> >>
>
> --
Sent from my Mobile device

Re: [DISCUSS] Roadmap SystemML 1.0

Posted by du...@gmail.com.
Now that we've had some discussion here, it would be good to transfer this discussion into a JIRA epic, containing sub tasks. That way, we can properly track our progress on these items and facilitate contributions from the community.  Note that some of the sub tasks may already exist as individual issues.

Would anyone in the community like to volunteer for creating these issues?

- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 4, 2017, at 6:00 PM, dusenberrymw@gmail.com wrote:
> 
> Overall, this is a good list of items that should be worked on, particularly because it contains several user-facing items.  However, to echo what Luciano said, I'm also concerned about the timeline.  At this stage, I agree that we need to release more often, and with a more user-oriented "product" focus as a guide for timelines.  I.e. we should orient our release timelines around items that focus on the "product" of allowing the user to work on a wide range of ML problems in a simple and easy manner on top of Spark.
> 
> With that in mind, I agree that a focus on a subset of (1) and (2) would be good for an immediate release, with a particular focus on Spark 2.0 support as a priority.
> 
> How about we aim for a February 1st release date for the initial items?
> 
> -Mike
> 
> --
> 
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
> 
> Sent from my iPhone.
> 
> 
>> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <np...@us.ibm.com> wrote:
>> 
>> Hi Matthias,
>> 
>> Thanks for the detailed roadmap. 
>> 
>> +1 for all the items with few modifications.
>> 
>> 1) APIs and Language:
>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> >> Ensure Python and Scala MLContext have same API capability.
>> 
>> * Remove old MLContext
>> * Consolidate MLContext and JMLC
>> * Full support for Scala/Python DSLs
>> >> +1 for Python DSL except for push-down of loop structures and functions. 
>> 
>> * Remove old file-based transform
>> * Scala/Python wrappers for all existing algorithms
>> * Data converters (additional formats: e.g., libsvm; performance)
>> 
>> 2) Updated Dependencies:
>> * Spark 2.0 support
>> * Matrix block library (isolated jar)
>> 
>> 3) Compiler/Runtime Features:
>> * GPU support (full compiler and runtime support)
>> >> Can we break this down into phases: https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the timeline of the phases in the JIRA.
>> 
>> * Compressed linear algebra v2
>> * Code generation (automatic operator fusion)
>> * Extended parfor (full spark exploitation, micro-batch support)
>> * Scale-up architecture (large dense blocks, numa)?
>> 
>> 4) Tools
>> * Extended stats (task locality, shuffle, etc)
>> * Cloud resource advisor (extended resource optimizer)?
>> 
>> 5) Algorithms
>> * Graduate "staging" algorithms (robustness/performance)
>> * Perftest: include all algorithms into automated performance tests
>> >> via spark-submit + via Scala/Python wrappers
>> 
>> * Simplify usage decision trees, random forest, mlogreg, msvm 
>> (preprocessing, label representation, etc)
>> >> + command-line variable naming. For example: maxi, maxiter, etc.
>> 
>> Thanks,
>> 
>> Niketan Pansare
>> IBM Almaden Research Center
>> E-mail: npansar At us.ibm.com
>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>> 
>> Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and (4) can be done incrementally. For (5), some of the changes might also
>> 
>> From: Matthias Boehm <mb...@googlemail.com>
>> To: dev@systemml.incubator.apache.org
>> Date: 01/03/2017 02:44 PM
>> Subject: Re: [DISCUSS] Roadmap SystemML 1.0
>> 
>> 
>> 
>> 
>> Yes indeed, most of (3) and (4) can be done incrementally. For (5), some 
>> of the changes might also modify the signature of algorithms (i.e., 
>> parameters and required input data) but it would help, for example with 
>> decision trees, as users no longer need to dummy code their inputs.
>> 
>> Generally, I'm fine with making (3), (4), and part of (5) optional and 
>> let the "must-have" features from (1) and (2) determine the timeline.
>> 
>> Regards,
>> Matthias
>> 
>> On 1/3/2017 11:27 PM, Luciano Resende wrote:
>> > On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <mb...@googlemail.com>
>> > wrote:
>> >
>> >> I'd like to initiate the discussion of a concrete roadmap for our next
>> >> release. According, to previous discussions, I'd think it's fair to say
>> >> that we agree on calling it SystemML 1.0. We should carefully plan this
>> >> release as it's an opportunity to change APIs and remove some older
>> >> deprecated features. I'd like to encourage not just developers but also the
>> >> broader community to participate in this discussion.
>> >>
>> >> Personally, I think a target date of Q2/2017 is realistic. Let's start
>> >> with collecting the major features and changes that potentially affect
>> >> users. Here is an initial list, but please feel free to add and up- or
>> >> down-vote the individual items.
>> >>
>> >> 1) APIs and Language:
>> >> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> >> * Remove old MLContext
>> >> * Consolidate MLContext and JMLC
>> >> * Full support for Scala/Python DSLs
>> >> * Remove old file-based transform
>> >> * Scala/Python wrappers for all existing algorithms
>> >> * Data converters (additional formats: e.g., libsvm; performance)
>> >>
>> >> 2) Updated Dependencies:
>> >> * Spark 2.0 support
>> >> * Matrix block library (isolated jar)
>> >>
>> >> 3) Compiler/Runtime Features:
>> >> * GPU support (full compiler and runtime support)
>> >> * Compressed linear algebra v2
>> >> * Code generation (automatic operator fusion)
>> >> * Extended parfor (full spark exploitation, micro-batch support)
>> >> * Scale-up architecture (large dense blocks, numa)?
>> >>
>> >> 4) Tools
>> >> * Extended stats (task locality, shuffle, etc)
>> >> * Cloud resource advisor (extended resource optimizer)?
>> >>
>> >> 5) Algorithms
>> >> * Graduate "staging" algorithms (robustness/performance)
>> >> * Perftest: include all algorithms into automated performance tests
>> >> * Simplify usage decision trees, random forest, mlogreg, msvm
>> >> (preprocessing, label representation, etc)
>> >>
>> >> Items marked with a ? can potentially be moved out to subsequent releases.
>> >>
>> >>
>> >> Regards,
>> >> Matthias
>> >>
>> >
>> > My understanding is that most of the items in 1 and 2 are going to break
>> > backward compatibility, while the others can be done incrementally. Is this
>> > assumption correct? If so, can we finish 1 and 2 and do a 1.0 release. and
>> > them, continue with 3, 4, 5, etc ? as I don't think we should wait for
>> > 2017/Q2 to do a 1.0 release. I believe in release early, release often,
>> > particularly to attract new users, that can help verifying and contributing
>> > to specific releases.
>> >
>> > Thoughts ?
>> >
>> 
>> 
>> 
>> 

Re: [DISCUSS] Roadmap SystemML 1.0

Posted by du...@gmail.com.
Overall, this is a good list of items that should be worked on, particularly because it contains several user-facing items.  However, to echo what Luciano said, I'm also concerned about the timeline.  At this stage, I agree that we need to release more often, and with a more user-oriented "product" focus as a guide for timelines.  I.e. we should orient our release timelines around items that focus on the "product" of allowing the user to work on a wide range of ML problems in a simple and easy manner on top of Spark.

With that in mind, I agree that a focus on a subset of (1) and (2) would be good for an immediate release, with a particular focus on Spark 2.0 support as a priority.

How about we aim for a February 1st release date for the initial items?

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <np...@us.ibm.com> wrote:
> 
> Hi Matthias,
> 
> Thanks for the detailed roadmap. 
> 
> +1 for all the items with few modifications.
> 
> 1) APIs and Language:
> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
> >> Ensure Python and Scala MLContext have same API capability.
> 
> * Remove old MLContext
> * Consolidate MLContext and JMLC
> * Full support for Scala/Python DSLs
> >> +1 for Python DSL except for push-down of loop structures and functions. 
> 
> * Remove old file-based transform
> * Scala/Python wrappers for all existing algorithms
> * Data converters (additional formats: e.g., libsvm; performance)
> 
> 2) Updated Dependencies:
> * Spark 2.0 support
> * Matrix block library (isolated jar)
> 
> 3) Compiler/Runtime Features:
> * GPU support (full compiler and runtime support)
> >> Can we break this down into phases: https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the timeline of the phases in the JIRA.
> 
> * Compressed linear algebra v2
> * Code generation (automatic operator fusion)
> * Extended parfor (full spark exploitation, micro-batch support)
> * Scale-up architecture (large dense blocks, numa)?
> 
> 4) Tools
> * Extended stats (task locality, shuffle, etc)
> * Cloud resource advisor (extended resource optimizer)?
> 
> 5) Algorithms
> * Graduate "staging" algorithms (robustness/performance)
> * Perftest: include all algorithms into automated performance tests
> >> via spark-submit + via Scala/Python wrappers
> 
> * Simplify usage decision trees, random forest, mlogreg, msvm 
> (preprocessing, label representation, etc)
> >> + command-line variable naming. For example: maxi, maxiter, etc.
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and (4) can be done incrementally. For (5), some of the changes might also
> 
> From: Matthias Boehm <mb...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date: 01/03/2017 02:44 PM
> Subject: Re: [DISCUSS] Roadmap SystemML 1.0
> 
> 
> 
> 
> Yes indeed, most of (3) and (4) can be done incrementally. For (5), some 
> of the changes might also modify the signature of algorithms (i.e., 
> parameters and required input data) but it would help, for example with 
> decision trees, as users no longer need to dummy code their inputs.
> 
> Generally, I'm fine with making (3), (4), and part of (5) optional and 
> let the "must-have" features from (1) and (2) determine the timeline.
> 
> Regards,
> Matthias
> 
> On 1/3/2017 11:27 PM, Luciano Resende wrote:
> > On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <mb...@googlemail.com>
> > wrote:
> >
> >> I'd like to initiate the discussion of a concrete roadmap for our next
> >> release. According, to previous discussions, I'd think it's fair to say
> >> that we agree on calling it SystemML 1.0. We should carefully plan this
> >> release as it's an opportunity to change APIs and remove some older
> >> deprecated features. I'd like to encourage not just developers but also the
> >> broader community to participate in this discussion.
> >>
> >> Personally, I think a target date of Q2/2017 is realistic. Let's start
> >> with collecting the major features and changes that potentially affect
> >> users. Here is an initial list, but please feel free to add and up- or
> >> down-vote the individual items.
> >>
> >> 1) APIs and Language:
> >> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
> >> * Remove old MLContext
> >> * Consolidate MLContext and JMLC
> >> * Full support for Scala/Python DSLs
> >> * Remove old file-based transform
> >> * Scala/Python wrappers for all existing algorithms
> >> * Data converters (additional formats: e.g., libsvm; performance)
> >>
> >> 2) Updated Dependencies:
> >> * Spark 2.0 support
> >> * Matrix block library (isolated jar)
> >>
> >> 3) Compiler/Runtime Features:
> >> * GPU support (full compiler and runtime support)
> >> * Compressed linear algebra v2
> >> * Code generation (automatic operator fusion)
> >> * Extended parfor (full spark exploitation, micro-batch support)
> >> * Scale-up architecture (large dense blocks, numa)?
> >>
> >> 4) Tools
> >> * Extended stats (task locality, shuffle, etc)
> >> * Cloud resource advisor (extended resource optimizer)?
> >>
> >> 5) Algorithms
> >> * Graduate "staging" algorithms (robustness/performance)
> >> * Perftest: include all algorithms into automated performance tests
> >> * Simplify usage decision trees, random forest, mlogreg, msvm
> >> (preprocessing, label representation, etc)
> >>
> >> Items marked with a ? can potentially be moved out to subsequent releases.
> >>
> >>
> >> Regards,
> >> Matthias
> >>
> >
> > My understanding is that most of the items in 1 and 2 are going to break
> > backward compatibility, while the others can be done incrementally. Is this
> > assumption correct? If so, can we finish 1 and 2 and do a 1.0 release. and
> > them, continue with 3, 4, 5, etc ? as I don't think we should wait for
> > 2017/Q2 to do a 1.0 release. I believe in release early, release often,
> > particularly to attract new users, that can help verifying and contributing
> > to specific releases.
> >
> > Thoughts ?
> >
> 
> 
> 
> 

Re: [DISCUSS] Roadmap SystemML 1.0

Posted by Niketan Pansare <np...@us.ibm.com>.
Hi Matthias,

Thanks for the detailed roadmap.

+1 for all the items with few modifications.

1) APIs and Language:
* Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> Ensure Python and Scala MLContext have same API capability.

* Remove old MLContext
* Consolidate MLContext and JMLC
* Full support for Scala/Python DSLs
>> +1 for Python DSL except for push-down of loop structures and functions.


* Remove old file-based transform
* Scala/Python wrappers for all existing algorithms
* Data converters (additional formats: e.g., libsvm; performance)

2) Updated Dependencies:
* Spark 2.0 support
* Matrix block library (isolated jar)

3) Compiler/Runtime Features:
* GPU support (full compiler and runtime support)
>> Can we break this down into phases:
https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the
timeline of the phases in the JIRA.

* Compressed linear algebra v2
* Code generation (automatic operator fusion)
* Extended parfor (full spark exploitation, micro-batch support)
* Scale-up architecture (large dense blocks, numa)?

4) Tools
* Extended stats (task locality, shuffle, etc)
* Cloud resource advisor (extended resource optimizer)?

5) Algorithms
* Graduate "staging" algorithms (robustness/performance)
* Perftest: include all algorithms into automated performance tests
>> via spark-submit + via Scala/Python wrappers

* Simplify usage decision trees, random forest, mlogreg, msvm
(preprocessing, label representation, etc)
>> + command-line variable naming. For example: maxi, maxiter, etc.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	Matthias Boehm <mb...@googlemail.com>
To:	dev@systemml.incubator.apache.org
Date:	01/03/2017 02:44 PM
Subject:	Re: [DISCUSS] Roadmap SystemML 1.0



Yes indeed, most of (3) and (4) can be done incrementally. For (5), some
of the changes might also modify the signature of algorithms (i.e.,
parameters and required input data) but it would help, for example with
decision trees, as users no longer need to dummy code their inputs.

Generally, I'm fine with making (3), (4), and part of (5) optional and
let the "must-have" features from (1) and (2) determine the timeline.

Regards,
Matthias

On 1/3/2017 11:27 PM, Luciano Resende wrote:
> On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <mb...@googlemail.com>
> wrote:
>
>> I'd like to initiate the discussion of a concrete roadmap for our next
>> release. According, to previous discussions, I'd think it's fair to say
>> that we agree on calling it SystemML 1.0. We should carefully plan this
>> release as it's an opportunity to change APIs and remove some older
>> deprecated features. I'd like to encourage not just developers but also
the
>> broader community to participate in this discussion.
>>
>> Personally, I think a target date of Q2/2017 is realistic. Let's start
>> with collecting the major features and changes that potentially affect
>> users. Here is an initial list, but please feel free to add and up- or
>> down-vote the individual items.
>>
>> 1) APIs and Language:
>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> * Remove old MLContext
>> * Consolidate MLContext and JMLC
>> * Full support for Scala/Python DSLs
>> * Remove old file-based transform
>> * Scala/Python wrappers for all existing algorithms
>> * Data converters (additional formats: e.g., libsvm; performance)
>>
>> 2) Updated Dependencies:
>> * Spark 2.0 support
>> * Matrix block library (isolated jar)
>>
>> 3) Compiler/Runtime Features:
>> * GPU support (full compiler and runtime support)
>> * Compressed linear algebra v2
>> * Code generation (automatic operator fusion)
>> * Extended parfor (full spark exploitation, micro-batch support)
>> * Scale-up architecture (large dense blocks, numa)?
>>
>> 4) Tools
>> * Extended stats (task locality, shuffle, etc)
>> * Cloud resource advisor (extended resource optimizer)?
>>
>> 5) Algorithms
>> * Graduate "staging" algorithms (robustness/performance)
>> * Perftest: include all algorithms into automated performance tests
>> * Simplify usage decision trees, random forest, mlogreg, msvm
>> (preprocessing, label representation, etc)
>>
>> Items marked with a ? can potentially be moved out to subsequent
releases.
>>
>>
>> Regards,
>> Matthias
>>
>
> My understanding is that most of the items in 1 and 2 are going to break
> backward compatibility, while the others can be done incrementally. Is
this
> assumption correct? If so, can we finish 1 and 2 and do a 1.0 release.
and
> them, continue with 3, 4, 5, etc ? as I don't think we should wait for
> 2017/Q2 to do a 1.0 release. I believe in release early, release often,
> particularly to attract new users, that can help verifying and
contributing
> to specific releases.
>
> Thoughts ?
>




Re: [DISCUSS] Roadmap SystemML 1.0

Posted by Matthias Boehm <mb...@googlemail.com>.
Yes indeed, most of (3) and (4) can be done incrementally. For (5), some 
of the changes might also modify the signature of algorithms (i.e., 
parameters and required input data) but it would help, for example with 
decision trees, as users no longer need to dummy code their inputs.

Generally, I'm fine with making (3), (4), and part of (5) optional and 
let the "must-have" features from (1) and (2) determine the timeline.

Regards,
Matthias

On 1/3/2017 11:27 PM, Luciano Resende wrote:
> On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <mb...@googlemail.com>
> wrote:
>
>> I'd like to initiate the discussion of a concrete roadmap for our next
>> release. According, to previous discussions, I'd think it's fair to say
>> that we agree on calling it SystemML 1.0. We should carefully plan this
>> release as it's an opportunity to change APIs and remove some older
>> deprecated features. I'd like to encourage not just developers but also the
>> broader community to participate in this discussion.
>>
>> Personally, I think a target date of Q2/2017 is realistic. Let's start
>> with collecting the major features and changes that potentially affect
>> users. Here is an initial list, but please feel free to add and up- or
>> down-vote the individual items.
>>
>> 1) APIs and Language:
>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> * Remove old MLContext
>> * Consolidate MLContext and JMLC
>> * Full support for Scala/Python DSLs
>> * Remove old file-based transform
>> * Scala/Python wrappers for all existing algorithms
>> * Data converters (additional formats: e.g., libsvm; performance)
>>
>> 2) Updated Dependencies:
>> * Spark 2.0 support
>> * Matrix block library (isolated jar)
>>
>> 3) Compiler/Runtime Features:
>> * GPU support (full compiler and runtime support)
>> * Compressed linear algebra v2
>> * Code generation (automatic operator fusion)
>> * Extended parfor (full spark exploitation, micro-batch support)
>> * Scale-up architecture (large dense blocks, numa)?
>>
>> 4) Tools
>> * Extended stats (task locality, shuffle, etc)
>> * Cloud resource advisor (extended resource optimizer)?
>>
>> 5) Algorithms
>> * Graduate "staging" algorithms (robustness/performance)
>> * Perftest: include all algorithms into automated performance tests
>> * Simplify usage decision trees, random forest, mlogreg, msvm
>> (preprocessing, label representation, etc)
>>
>> Items marked with a ? can potentially be moved out to subsequent releases.
>>
>>
>> Regards,
>> Matthias
>>
>
> My understanding is that most of the items in 1 and 2 are going to break
> backward compatibility, while the others can be done incrementally. Is this
> assumption correct? If so, can we finish 1 and 2 and do a 1.0 release. and
> them, continue with 3, 4, 5, etc ? as I don't think we should wait for
> 2017/Q2 to do a 1.0 release. I believe in release early, release often,
> particularly to attract new users, that can help verifying and contributing
> to specific releases.
>
> Thoughts ?
>

Re: [DISCUSS] Roadmap SystemML 1.0

Posted by Luciano Resende <lu...@gmail.com>.
On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <mb...@googlemail.com>
wrote:

> I'd like to initiate the discussion of a concrete roadmap for our next
> release. According, to previous discussions, I'd think it's fair to say
> that we agree on calling it SystemML 1.0. We should carefully plan this
> release as it's an opportunity to change APIs and remove some older
> deprecated features. I'd like to encourage not just developers but also the
> broader community to participate in this discussion.
>
> Personally, I think a target date of Q2/2017 is realistic. Let's start
> with collecting the major features and changes that potentially affect
> users. Here is an initial list, but please feel free to add and up- or
> down-vote the individual items.
>
> 1) APIs and Language:
> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
> * Remove old MLContext
> * Consolidate MLContext and JMLC
> * Full support for Scala/Python DSLs
> * Remove old file-based transform
> * Scala/Python wrappers for all existing algorithms
> * Data converters (additional formats: e.g., libsvm; performance)
>
> 2) Updated Dependencies:
> * Spark 2.0 support
> * Matrix block library (isolated jar)
>
> 3) Compiler/Runtime Features:
> * GPU support (full compiler and runtime support)
> * Compressed linear algebra v2
> * Code generation (automatic operator fusion)
> * Extended parfor (full spark exploitation, micro-batch support)
> * Scale-up architecture (large dense blocks, numa)?
>
> 4) Tools
> * Extended stats (task locality, shuffle, etc)
> * Cloud resource advisor (extended resource optimizer)?
>
> 5) Algorithms
> * Graduate "staging" algorithms (robustness/performance)
> * Perftest: include all algorithms into automated performance tests
> * Simplify usage decision trees, random forest, mlogreg, msvm
> (preprocessing, label representation, etc)
>
> Items marked with a ? can potentially be moved out to subsequent releases.
>
>
> Regards,
> Matthias
>

My understanding is that most of the items in 1 and 2 are going to break
backward compatibility, while the others can be done incrementally. Is this
assumption correct? If so, can we finish 1 and 2 and do a 1.0 release. and
them, continue with 3, 4, 5, etc ? as I don't think we should wait for
2017/Q2 to do a 1.0 release. I believe in release early, release often,
particularly to attract new users, that can help verifying and contributing
to specific releases.

Thoughts ?

-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/