You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by Frank McQuillan <fm...@pivotal.io> on 2016/04/14 02:28:20 UTC

Apache MADlib (incubating) candidate roadmap 2016

Hello MADlib users and developers,

Now that the 1.9 release is out the door, I put up a candidate roadmap on
the MADlib wiki
https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
that I would like to get the community’s input on.  It is also copied below
in this email.  This roadmap is roughly for the remainder of 2016.

Please feel free to share your thoughts on the proposal.

It is based on the mailing lists, JIRA issues, pull requests, feedback from
users, conferences ,etc.


Predictive models

Novelty detection using 1-class SVM
https://issues.apache.org/jira/browse/MADLIB-990
Mixed effects modeling   https://issues.apache.org/jira/browse/MADLIB-987
k-nearest neighbors (kNN)   https://issues.apache.org/jira/browse/MADLIB-927
MCMC Probit and Logit regression
Factorization machines
Gaussian Mixture Model using Expectation Maximization (EM) algorithm
Multi-layer Perceptron
Geographically Weighted Regression (GWR)


Graph

Shortest path https://issues.apache.org/jira/browse/MADLIB-992
Standard traversal
   depth first search
   breadth first search
   topological sort
One mode projection (converting a bi-partitite graph of user-item graph to
user-user or item-item graph)
Connected components
Page rank
Hierarchical graph cut
Between-ness centrality
Minimum spanning tree


Utilities

Path functions (phase 2)   https://issues.apache.org/jira/browse/MADLIB-977
Prediction metrics   https://issues.apache.org/jira/browse/MADLIB-907
Sessionization   https://issues.apache.org/jira/browse/MADLIB-909
Pivoting   https://issues.apache.org/jira/browse/MADLIB-908
Anonymization   https://issues.apache.org/jira/browse/MADLIB-911
URI tools   https://issues.apache.org/jira/browse/MADLIB-910
Stratified sampling   https://issues.apache.org/jira/browse/MADLIB-986


Usability

Expand coverage for PivotalR
Expand coverage for PMML export
Interface improvement and consistency
Implement an interface using named parameters
Python API


Performance and scalability

Work around PostgreSQL 1 GB field size limit
https://issues.apache.org/jira/browse/MADLIB-991


Platform

Support for PostgreSQL 9.5
https://issues.apache.org/jira/browse/MADLIB-944


The next release v1.9.1 is at the end of June, and the JIRAs so far
targeted for this release are
https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.9.1%20ORDER%20BY%20summary%20ASC%2C%20key%20ASC%2C%20priority%20DESC

I’d like to encourage us to get on a quarterly release cycle if at all
possible.

Regards,
Frank

Re: Apache MADlib (incubating) candidate roadmap 2016

Posted by Frank McQuillan <fm...@pivotal.io>.
Thanks Roman for your on-going support and comment.

I would suggest shorter, regular releases that are approximately date
driven.  That is what I would propose with Apache MADlib from now on.

The reason I say "approximately data driven" is that it is hard to say we
will release on quarterly boundaries if there is something in flight that
is nearly done, or some scale testing that has not completed yet, etc.  On
that last example, we run scale tests internally at Pivotal on a DCA for
Apache HAWQ and Greenplum, and it takes 2 weeks for the tests to run if
there are no infrastructure issues.  If there are, we incur delays.

If the community agrees, let's aim for the following in 2016:

* MADlib 1.9.1 end June
* MADiib 1.9.2 end Sept
* MADlib 2.0 end Dec

And it would be great to see some features from each of our main themes in
each release: pred models/graph/utilities/usability/performance/platform.

Frank



On Mon, Apr 18, 2016 at 7:37 PM, Roman Shaposhnik <ro...@shaposhnik.org>
wrote:

> Hi Frank!
>
> first of all -- this is a pretty awesome effort! Thank you!
>
> Quick question that is complimentary to the roadmap: what is the release
> model that you're proposing? Is it purely feature driven or date-driven?
> I kind of see both in this document.
>
> Personally, I'm a big fan of date driven models (think Ubuntu) especially
> for the incubating projects where they have to master the fine art of
> an ASF release. Is date-driven model possible/desirable for MADlib?
>
> Thanks,
> Roman.
>
> On Wed, Apr 13, 2016 at 5:28 PM, Frank McQuillan <fm...@pivotal.io>
> wrote:
> > Hello MADlib users and developers,
> >
> > Now that the 1.9 release is out the door, I put up a candidate roadmap on
> > the MADlib wiki
> > https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
> > that I would like to get the community’s input on.  It is also copied
> below
> > in this email.  This roadmap is roughly for the remainder of 2016.
> >
> > Please feel free to share your thoughts on the proposal.
> >
> > It is based on the mailing lists, JIRA issues, pull requests, feedback
> from
> > users, conferences ,etc.
> >
> >
> > Predictive models
> >
> > Novelty detection using 1-class SVM
> > https://issues.apache.org/jira/browse/MADLIB-990
> > Mixed effects modeling
> https://issues.apache.org/jira/browse/MADLIB-987
> > k-nearest neighbors (kNN)
> https://issues.apache.org/jira/browse/MADLIB-927
> > MCMC Probit and Logit regression
> > Factorization machines
> > Gaussian Mixture Model using Expectation Maximization (EM) algorithm
> > Multi-layer Perceptron
> > Geographically Weighted Regression (GWR)
> >
> >
> > Graph
> >
> > Shortest path https://issues.apache.org/jira/browse/MADLIB-992
> > Standard traversal
> >    depth first search
> >    breadth first search
> >    topological sort
> > One mode projection (converting a bi-partitite graph of user-item graph
> to
> > user-user or item-item graph)
> > Connected components
> > Page rank
> > Hierarchical graph cut
> > Between-ness centrality
> > Minimum spanning tree
> >
> >
> > Utilities
> >
> > Path functions (phase 2)
> https://issues.apache.org/jira/browse/MADLIB-977
> > Prediction metrics   https://issues.apache.org/jira/browse/MADLIB-907
> > Sessionization   https://issues.apache.org/jira/browse/MADLIB-909
> > Pivoting   https://issues.apache.org/jira/browse/MADLIB-908
> > Anonymization   https://issues.apache.org/jira/browse/MADLIB-911
> > URI tools   https://issues.apache.org/jira/browse/MADLIB-910
> > Stratified sampling   https://issues.apache.org/jira/browse/MADLIB-986
> >
> >
> > Usability
> >
> > Expand coverage for PivotalR
> > Expand coverage for PMML export
> > Interface improvement and consistency
> > Implement an interface using named parameters
> > Python API
> >
> >
> > Performance and scalability
> >
> > Work around PostgreSQL 1 GB field size limit
> > https://issues.apache.org/jira/browse/MADLIB-991
> >
> >
> > Platform
> >
> > Support for PostgreSQL 9.5
> > https://issues.apache.org/jira/browse/MADLIB-944
> >
> >
> > The next release v1.9.1 is at the end of June, and the JIRAs so far
> targeted
> > for this release are
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.9.1%20ORDER%20BY%20summary%20ASC%2C%20key%20ASC%2C%20priority%20DESC
> >
> > I’d like to encourage us to get on a quarterly release cycle if at all
> > possible.
> >
> > Regards,
> > Frank
> >
> >
> >
> >
> >
> >
>

Re: Apache MADlib (incubating) candidate roadmap 2016

Posted by Frank McQuillan <fm...@pivotal.io>.
Thanks Roman for your on-going support and comment.

I would suggest shorter, regular releases that are approximately date
driven.  That is what I would propose with Apache MADlib from now on.

The reason I say "approximately data driven" is that it is hard to say we
will release on quarterly boundaries if there is something in flight that
is nearly done, or some scale testing that has not completed yet, etc.  On
that last example, we run scale tests internally at Pivotal on a DCA for
Apache HAWQ and Greenplum, and it takes 2 weeks for the tests to run if
there are no infrastructure issues.  If there are, we incur delays.

If the community agrees, let's aim for the following in 2016:

* MADlib 1.9.1 end June
* MADiib 1.9.2 end Sept
* MADlib 2.0 end Dec

And it would be great to see some features from each of our main themes in
each release: pred models/graph/utilities/usability/performance/platform.

Frank



On Mon, Apr 18, 2016 at 7:37 PM, Roman Shaposhnik <ro...@shaposhnik.org>
wrote:

> Hi Frank!
>
> first of all -- this is a pretty awesome effort! Thank you!
>
> Quick question that is complimentary to the roadmap: what is the release
> model that you're proposing? Is it purely feature driven or date-driven?
> I kind of see both in this document.
>
> Personally, I'm a big fan of date driven models (think Ubuntu) especially
> for the incubating projects where they have to master the fine art of
> an ASF release. Is date-driven model possible/desirable for MADlib?
>
> Thanks,
> Roman.
>
> On Wed, Apr 13, 2016 at 5:28 PM, Frank McQuillan <fm...@pivotal.io>
> wrote:
> > Hello MADlib users and developers,
> >
> > Now that the 1.9 release is out the door, I put up a candidate roadmap on
> > the MADlib wiki
> > https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
> > that I would like to get the community’s input on.  It is also copied
> below
> > in this email.  This roadmap is roughly for the remainder of 2016.
> >
> > Please feel free to share your thoughts on the proposal.
> >
> > It is based on the mailing lists, JIRA issues, pull requests, feedback
> from
> > users, conferences ,etc.
> >
> >
> > Predictive models
> >
> > Novelty detection using 1-class SVM
> > https://issues.apache.org/jira/browse/MADLIB-990
> > Mixed effects modeling
> https://issues.apache.org/jira/browse/MADLIB-987
> > k-nearest neighbors (kNN)
> https://issues.apache.org/jira/browse/MADLIB-927
> > MCMC Probit and Logit regression
> > Factorization machines
> > Gaussian Mixture Model using Expectation Maximization (EM) algorithm
> > Multi-layer Perceptron
> > Geographically Weighted Regression (GWR)
> >
> >
> > Graph
> >
> > Shortest path https://issues.apache.org/jira/browse/MADLIB-992
> > Standard traversal
> >    depth first search
> >    breadth first search
> >    topological sort
> > One mode projection (converting a bi-partitite graph of user-item graph
> to
> > user-user or item-item graph)
> > Connected components
> > Page rank
> > Hierarchical graph cut
> > Between-ness centrality
> > Minimum spanning tree
> >
> >
> > Utilities
> >
> > Path functions (phase 2)
> https://issues.apache.org/jira/browse/MADLIB-977
> > Prediction metrics   https://issues.apache.org/jira/browse/MADLIB-907
> > Sessionization   https://issues.apache.org/jira/browse/MADLIB-909
> > Pivoting   https://issues.apache.org/jira/browse/MADLIB-908
> > Anonymization   https://issues.apache.org/jira/browse/MADLIB-911
> > URI tools   https://issues.apache.org/jira/browse/MADLIB-910
> > Stratified sampling   https://issues.apache.org/jira/browse/MADLIB-986
> >
> >
> > Usability
> >
> > Expand coverage for PivotalR
> > Expand coverage for PMML export
> > Interface improvement and consistency
> > Implement an interface using named parameters
> > Python API
> >
> >
> > Performance and scalability
> >
> > Work around PostgreSQL 1 GB field size limit
> > https://issues.apache.org/jira/browse/MADLIB-991
> >
> >
> > Platform
> >
> > Support for PostgreSQL 9.5
> > https://issues.apache.org/jira/browse/MADLIB-944
> >
> >
> > The next release v1.9.1 is at the end of June, and the JIRAs so far
> targeted
> > for this release are
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.9.1%20ORDER%20BY%20summary%20ASC%2C%20key%20ASC%2C%20priority%20DESC
> >
> > I’d like to encourage us to get on a quarterly release cycle if at all
> > possible.
> >
> > Regards,
> > Frank
> >
> >
> >
> >
> >
> >
>

Re: Apache MADlib (incubating) candidate roadmap 2016

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
Hi Frank!

first of all -- this is a pretty awesome effort! Thank you!

Quick question that is complimentary to the roadmap: what is the release
model that you're proposing? Is it purely feature driven or date-driven?
I kind of see both in this document.

Personally, I'm a big fan of date driven models (think Ubuntu) especially
for the incubating projects where they have to master the fine art of
an ASF release. Is date-driven model possible/desirable for MADlib?

Thanks,
Roman.

On Wed, Apr 13, 2016 at 5:28 PM, Frank McQuillan <fm...@pivotal.io> wrote:
> Hello MADlib users and developers,
>
> Now that the 1.9 release is out the door, I put up a candidate roadmap on
> the MADlib wiki
> https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
> that I would like to get the community’s input on.  It is also copied below
> in this email.  This roadmap is roughly for the remainder of 2016.
>
> Please feel free to share your thoughts on the proposal.
>
> It is based on the mailing lists, JIRA issues, pull requests, feedback from
> users, conferences ,etc.
>
>
> Predictive models
>
> Novelty detection using 1-class SVM
> https://issues.apache.org/jira/browse/MADLIB-990
> Mixed effects modeling   https://issues.apache.org/jira/browse/MADLIB-987
> k-nearest neighbors (kNN)   https://issues.apache.org/jira/browse/MADLIB-927
> MCMC Probit and Logit regression
> Factorization machines
> Gaussian Mixture Model using Expectation Maximization (EM) algorithm
> Multi-layer Perceptron
> Geographically Weighted Regression (GWR)
>
>
> Graph
>
> Shortest path https://issues.apache.org/jira/browse/MADLIB-992
> Standard traversal
>    depth first search
>    breadth first search
>    topological sort
> One mode projection (converting a bi-partitite graph of user-item graph to
> user-user or item-item graph)
> Connected components
> Page rank
> Hierarchical graph cut
> Between-ness centrality
> Minimum spanning tree
>
>
> Utilities
>
> Path functions (phase 2)   https://issues.apache.org/jira/browse/MADLIB-977
> Prediction metrics   https://issues.apache.org/jira/browse/MADLIB-907
> Sessionization   https://issues.apache.org/jira/browse/MADLIB-909
> Pivoting   https://issues.apache.org/jira/browse/MADLIB-908
> Anonymization   https://issues.apache.org/jira/browse/MADLIB-911
> URI tools   https://issues.apache.org/jira/browse/MADLIB-910
> Stratified sampling   https://issues.apache.org/jira/browse/MADLIB-986
>
>
> Usability
>
> Expand coverage for PivotalR
> Expand coverage for PMML export
> Interface improvement and consistency
> Implement an interface using named parameters
> Python API
>
>
> Performance and scalability
>
> Work around PostgreSQL 1 GB field size limit
> https://issues.apache.org/jira/browse/MADLIB-991
>
>
> Platform
>
> Support for PostgreSQL 9.5
> https://issues.apache.org/jira/browse/MADLIB-944
>
>
> The next release v1.9.1 is at the end of June, and the JIRAs so far targeted
> for this release are
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.9.1%20ORDER%20BY%20summary%20ASC%2C%20key%20ASC%2C%20priority%20DESC
>
> I’d like to encourage us to get on a quarterly release cycle if at all
> possible.
>
> Regards,
> Frank
>
>
>
>
>
>

Re: Apache MADlib (incubating) candidate roadmap 2016

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
Hi Frank!

first of all -- this is a pretty awesome effort! Thank you!

Quick question that is complimentary to the roadmap: what is the release
model that you're proposing? Is it purely feature driven or date-driven?
I kind of see both in this document.

Personally, I'm a big fan of date driven models (think Ubuntu) especially
for the incubating projects where they have to master the fine art of
an ASF release. Is date-driven model possible/desirable for MADlib?

Thanks,
Roman.

On Wed, Apr 13, 2016 at 5:28 PM, Frank McQuillan <fm...@pivotal.io> wrote:
> Hello MADlib users and developers,
>
> Now that the 1.9 release is out the door, I put up a candidate roadmap on
> the MADlib wiki
> https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
> that I would like to get the community’s input on.  It is also copied below
> in this email.  This roadmap is roughly for the remainder of 2016.
>
> Please feel free to share your thoughts on the proposal.
>
> It is based on the mailing lists, JIRA issues, pull requests, feedback from
> users, conferences ,etc.
>
>
> Predictive models
>
> Novelty detection using 1-class SVM
> https://issues.apache.org/jira/browse/MADLIB-990
> Mixed effects modeling   https://issues.apache.org/jira/browse/MADLIB-987
> k-nearest neighbors (kNN)   https://issues.apache.org/jira/browse/MADLIB-927
> MCMC Probit and Logit regression
> Factorization machines
> Gaussian Mixture Model using Expectation Maximization (EM) algorithm
> Multi-layer Perceptron
> Geographically Weighted Regression (GWR)
>
>
> Graph
>
> Shortest path https://issues.apache.org/jira/browse/MADLIB-992
> Standard traversal
>    depth first search
>    breadth first search
>    topological sort
> One mode projection (converting a bi-partitite graph of user-item graph to
> user-user or item-item graph)
> Connected components
> Page rank
> Hierarchical graph cut
> Between-ness centrality
> Minimum spanning tree
>
>
> Utilities
>
> Path functions (phase 2)   https://issues.apache.org/jira/browse/MADLIB-977
> Prediction metrics   https://issues.apache.org/jira/browse/MADLIB-907
> Sessionization   https://issues.apache.org/jira/browse/MADLIB-909
> Pivoting   https://issues.apache.org/jira/browse/MADLIB-908
> Anonymization   https://issues.apache.org/jira/browse/MADLIB-911
> URI tools   https://issues.apache.org/jira/browse/MADLIB-910
> Stratified sampling   https://issues.apache.org/jira/browse/MADLIB-986
>
>
> Usability
>
> Expand coverage for PivotalR
> Expand coverage for PMML export
> Interface improvement and consistency
> Implement an interface using named parameters
> Python API
>
>
> Performance and scalability
>
> Work around PostgreSQL 1 GB field size limit
> https://issues.apache.org/jira/browse/MADLIB-991
>
>
> Platform
>
> Support for PostgreSQL 9.5
> https://issues.apache.org/jira/browse/MADLIB-944
>
>
> The next release v1.9.1 is at the end of June, and the JIRAs so far targeted
> for this release are
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.9.1%20ORDER%20BY%20summary%20ASC%2C%20key%20ASC%2C%20priority%20DESC
>
> I’d like to encourage us to get on a quarterly release cycle if at all
> possible.
>
> Regards,
> Frank
>
>
>
>
>
>