You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by Frank McQuillan <fm...@pivotal.io> on 2016/04/14 02:28:20 UTC
Apache MADlib (incubating) candidate roadmap 2016
Hello MADlib users and developers,
Now that the 1.9 release is out the door, I put up a candidate roadmap on
the MADlib wiki
https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
that I would like to get the community’s input on. It is also copied below
in this email. This roadmap is roughly for the remainder of 2016.
Please feel free to share your thoughts on the proposal.
It is based on the mailing lists, JIRA issues, pull requests, feedback from
users, conferences ,etc.
Predictive models
Novelty detection using 1-class SVM
https://issues.apache.org/jira/browse/MADLIB-990
Mixed effects modeling https://issues.apache.org/jira/browse/MADLIB-987
k-nearest neighbors (kNN) https://issues.apache.org/jira/browse/MADLIB-927
MCMC Probit and Logit regression
Factorization machines
Gaussian Mixture Model using Expectation Maximization (EM) algorithm
Multi-layer Perceptron
Geographically Weighted Regression (GWR)
Graph
Shortest path https://issues.apache.org/jira/browse/MADLIB-992
Standard traversal
depth first search
breadth first search
topological sort
One mode projection (converting a bi-partitite graph of user-item graph to
user-user or item-item graph)
Connected components
Page rank
Hierarchical graph cut
Between-ness centrality
Minimum spanning tree
Utilities
Path functions (phase 2) https://issues.apache.org/jira/browse/MADLIB-977
Prediction metrics https://issues.apache.org/jira/browse/MADLIB-907
Sessionization https://issues.apache.org/jira/browse/MADLIB-909
Pivoting https://issues.apache.org/jira/browse/MADLIB-908
Anonymization https://issues.apache.org/jira/browse/MADLIB-911
URI tools https://issues.apache.org/jira/browse/MADLIB-910
Stratified sampling https://issues.apache.org/jira/browse/MADLIB-986
Usability
Expand coverage for PivotalR
Expand coverage for PMML export
Interface improvement and consistency
Implement an interface using named parameters
Python API
Performance and scalability
Work around PostgreSQL 1 GB field size limit
https://issues.apache.org/jira/browse/MADLIB-991
Platform
Support for PostgreSQL 9.5
https://issues.apache.org/jira/browse/MADLIB-944
The next release v1.9.1 is at the end of June, and the JIRAs so far
targeted for this release are
https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.9.1%20ORDER%20BY%20summary%20ASC%2C%20key%20ASC%2C%20priority%20DESC
I’d like to encourage us to get on a quarterly release cycle if at all
possible.
Regards,
Frank
Re: Apache MADlib (incubating) candidate roadmap 2016
Posted by Frank McQuillan <fm...@pivotal.io>.
Thanks Roman for your on-going support and comment.
I would suggest shorter, regular releases that are approximately date
driven. That is what I would propose with Apache MADlib from now on.
The reason I say "approximately data driven" is that it is hard to say we
will release on quarterly boundaries if there is something in flight that
is nearly done, or some scale testing that has not completed yet, etc. On
that last example, we run scale tests internally at Pivotal on a DCA for
Apache HAWQ and Greenplum, and it takes 2 weeks for the tests to run if
there are no infrastructure issues. If there are, we incur delays.
If the community agrees, let's aim for the following in 2016:
* MADlib 1.9.1 end June
* MADiib 1.9.2 end Sept
* MADlib 2.0 end Dec
And it would be great to see some features from each of our main themes in
each release: pred models/graph/utilities/usability/performance/platform.
Frank
On Mon, Apr 18, 2016 at 7:37 PM, Roman Shaposhnik <ro...@shaposhnik.org>
wrote:
> Hi Frank!
>
> first of all -- this is a pretty awesome effort! Thank you!
>
> Quick question that is complimentary to the roadmap: what is the release
> model that you're proposing? Is it purely feature driven or date-driven?
> I kind of see both in this document.
>
> Personally, I'm a big fan of date driven models (think Ubuntu) especially
> for the incubating projects where they have to master the fine art of
> an ASF release. Is date-driven model possible/desirable for MADlib?
>
> Thanks,
> Roman.
>
> On Wed, Apr 13, 2016 at 5:28 PM, Frank McQuillan <fm...@pivotal.io>
> wrote:
> > Hello MADlib users and developers,
> >
> > Now that the 1.9 release is out the door, I put up a candidate roadmap on
> > the MADlib wiki
> > https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
> > that I would like to get the community’s input on. It is also copied
> below
> > in this email. This roadmap is roughly for the remainder of 2016.
> >
> > Please feel free to share your thoughts on the proposal.
> >
> > It is based on the mailing lists, JIRA issues, pull requests, feedback
> from
> > users, conferences ,etc.
> >
> >
> > Predictive models
> >
> > Novelty detection using 1-class SVM
> > https://issues.apache.org/jira/browse/MADLIB-990
> > Mixed effects modeling
> https://issues.apache.org/jira/browse/MADLIB-987
> > k-nearest neighbors (kNN)
> https://issues.apache.org/jira/browse/MADLIB-927
> > MCMC Probit and Logit regression
> > Factorization machines
> > Gaussian Mixture Model using Expectation Maximization (EM) algorithm
> > Multi-layer Perceptron
> > Geographically Weighted Regression (GWR)
> >
> >
> > Graph
> >
> > Shortest path https://issues.apache.org/jira/browse/MADLIB-992
> > Standard traversal
> > depth first search
> > breadth first search
> > topological sort
> > One mode projection (converting a bi-partitite graph of user-item graph
> to
> > user-user or item-item graph)
> > Connected components
> > Page rank
> > Hierarchical graph cut
> > Between-ness centrality
> > Minimum spanning tree
> >
> >
> > Utilities
> >
> > Path functions (phase 2)
> https://issues.apache.org/jira/browse/MADLIB-977
> > Prediction metrics https://issues.apache.org/jira/browse/MADLIB-907
> > Sessionization https://issues.apache.org/jira/browse/MADLIB-909
> > Pivoting https://issues.apache.org/jira/browse/MADLIB-908
> > Anonymization https://issues.apache.org/jira/browse/MADLIB-911
> > URI tools https://issues.apache.org/jira/browse/MADLIB-910
> > Stratified sampling https://issues.apache.org/jira/browse/MADLIB-986
> >
> >
> > Usability
> >
> > Expand coverage for PivotalR
> > Expand coverage for PMML export
> > Interface improvement and consistency
> > Implement an interface using named parameters
> > Python API
> >
> >
> > Performance and scalability
> >
> > Work around PostgreSQL 1 GB field size limit
> > https://issues.apache.org/jira/browse/MADLIB-991
> >
> >
> > Platform
> >
> > Support for PostgreSQL 9.5
> > https://issues.apache.org/jira/browse/MADLIB-944
> >
> >
> > The next release v1.9.1 is at the end of June, and the JIRAs so far
> targeted
> > for this release are
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.9.1%20ORDER%20BY%20summary%20ASC%2C%20key%20ASC%2C%20priority%20DESC
> >
> > I’d like to encourage us to get on a quarterly release cycle if at all
> > possible.
> >
> > Regards,
> > Frank
> >
> >
> >
> >
> >
> >
>
Re: Apache MADlib (incubating) candidate roadmap 2016
Posted by Frank McQuillan <fm...@pivotal.io>.
Thanks Roman for your on-going support and comment.
I would suggest shorter, regular releases that are approximately date
driven. That is what I would propose with Apache MADlib from now on.
The reason I say "approximately data driven" is that it is hard to say we
will release on quarterly boundaries if there is something in flight that
is nearly done, or some scale testing that has not completed yet, etc. On
that last example, we run scale tests internally at Pivotal on a DCA for
Apache HAWQ and Greenplum, and it takes 2 weeks for the tests to run if
there are no infrastructure issues. If there are, we incur delays.
If the community agrees, let's aim for the following in 2016:
* MADlib 1.9.1 end June
* MADiib 1.9.2 end Sept
* MADlib 2.0 end Dec
And it would be great to see some features from each of our main themes in
each release: pred models/graph/utilities/usability/performance/platform.
Frank
On Mon, Apr 18, 2016 at 7:37 PM, Roman Shaposhnik <ro...@shaposhnik.org>
wrote:
> Hi Frank!
>
> first of all -- this is a pretty awesome effort! Thank you!
>
> Quick question that is complimentary to the roadmap: what is the release
> model that you're proposing? Is it purely feature driven or date-driven?
> I kind of see both in this document.
>
> Personally, I'm a big fan of date driven models (think Ubuntu) especially
> for the incubating projects where they have to master the fine art of
> an ASF release. Is date-driven model possible/desirable for MADlib?
>
> Thanks,
> Roman.
>
> On Wed, Apr 13, 2016 at 5:28 PM, Frank McQuillan <fm...@pivotal.io>
> wrote:
> > Hello MADlib users and developers,
> >
> > Now that the 1.9 release is out the door, I put up a candidate roadmap on
> > the MADlib wiki
> > https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
> > that I would like to get the community’s input on. It is also copied
> below
> > in this email. This roadmap is roughly for the remainder of 2016.
> >
> > Please feel free to share your thoughts on the proposal.
> >
> > It is based on the mailing lists, JIRA issues, pull requests, feedback
> from
> > users, conferences ,etc.
> >
> >
> > Predictive models
> >
> > Novelty detection using 1-class SVM
> > https://issues.apache.org/jira/browse/MADLIB-990
> > Mixed effects modeling
> https://issues.apache.org/jira/browse/MADLIB-987
> > k-nearest neighbors (kNN)
> https://issues.apache.org/jira/browse/MADLIB-927
> > MCMC Probit and Logit regression
> > Factorization machines
> > Gaussian Mixture Model using Expectation Maximization (EM) algorithm
> > Multi-layer Perceptron
> > Geographically Weighted Regression (GWR)
> >
> >
> > Graph
> >
> > Shortest path https://issues.apache.org/jira/browse/MADLIB-992
> > Standard traversal
> > depth first search
> > breadth first search
> > topological sort
> > One mode projection (converting a bi-partitite graph of user-item graph
> to
> > user-user or item-item graph)
> > Connected components
> > Page rank
> > Hierarchical graph cut
> > Between-ness centrality
> > Minimum spanning tree
> >
> >
> > Utilities
> >
> > Path functions (phase 2)
> https://issues.apache.org/jira/browse/MADLIB-977
> > Prediction metrics https://issues.apache.org/jira/browse/MADLIB-907
> > Sessionization https://issues.apache.org/jira/browse/MADLIB-909
> > Pivoting https://issues.apache.org/jira/browse/MADLIB-908
> > Anonymization https://issues.apache.org/jira/browse/MADLIB-911
> > URI tools https://issues.apache.org/jira/browse/MADLIB-910
> > Stratified sampling https://issues.apache.org/jira/browse/MADLIB-986
> >
> >
> > Usability
> >
> > Expand coverage for PivotalR
> > Expand coverage for PMML export
> > Interface improvement and consistency
> > Implement an interface using named parameters
> > Python API
> >
> >
> > Performance and scalability
> >
> > Work around PostgreSQL 1 GB field size limit
> > https://issues.apache.org/jira/browse/MADLIB-991
> >
> >
> > Platform
> >
> > Support for PostgreSQL 9.5
> > https://issues.apache.org/jira/browse/MADLIB-944
> >
> >
> > The next release v1.9.1 is at the end of June, and the JIRAs so far
> targeted
> > for this release are
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.9.1%20ORDER%20BY%20summary%20ASC%2C%20key%20ASC%2C%20priority%20DESC
> >
> > I’d like to encourage us to get on a quarterly release cycle if at all
> > possible.
> >
> > Regards,
> > Frank
> >
> >
> >
> >
> >
> >
>
Re: Apache MADlib (incubating) candidate roadmap 2016
Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
Hi Frank!
first of all -- this is a pretty awesome effort! Thank you!
Quick question that is complimentary to the roadmap: what is the release
model that you're proposing? Is it purely feature driven or date-driven?
I kind of see both in this document.
Personally, I'm a big fan of date driven models (think Ubuntu) especially
for the incubating projects where they have to master the fine art of
an ASF release. Is date-driven model possible/desirable for MADlib?
Thanks,
Roman.
On Wed, Apr 13, 2016 at 5:28 PM, Frank McQuillan <fm...@pivotal.io> wrote:
> Hello MADlib users and developers,
>
> Now that the 1.9 release is out the door, I put up a candidate roadmap on
> the MADlib wiki
> https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
> that I would like to get the community’s input on. It is also copied below
> in this email. This roadmap is roughly for the remainder of 2016.
>
> Please feel free to share your thoughts on the proposal.
>
> It is based on the mailing lists, JIRA issues, pull requests, feedback from
> users, conferences ,etc.
>
>
> Predictive models
>
> Novelty detection using 1-class SVM
> https://issues.apache.org/jira/browse/MADLIB-990
> Mixed effects modeling https://issues.apache.org/jira/browse/MADLIB-987
> k-nearest neighbors (kNN) https://issues.apache.org/jira/browse/MADLIB-927
> MCMC Probit and Logit regression
> Factorization machines
> Gaussian Mixture Model using Expectation Maximization (EM) algorithm
> Multi-layer Perceptron
> Geographically Weighted Regression (GWR)
>
>
> Graph
>
> Shortest path https://issues.apache.org/jira/browse/MADLIB-992
> Standard traversal
> depth first search
> breadth first search
> topological sort
> One mode projection (converting a bi-partitite graph of user-item graph to
> user-user or item-item graph)
> Connected components
> Page rank
> Hierarchical graph cut
> Between-ness centrality
> Minimum spanning tree
>
>
> Utilities
>
> Path functions (phase 2) https://issues.apache.org/jira/browse/MADLIB-977
> Prediction metrics https://issues.apache.org/jira/browse/MADLIB-907
> Sessionization https://issues.apache.org/jira/browse/MADLIB-909
> Pivoting https://issues.apache.org/jira/browse/MADLIB-908
> Anonymization https://issues.apache.org/jira/browse/MADLIB-911
> URI tools https://issues.apache.org/jira/browse/MADLIB-910
> Stratified sampling https://issues.apache.org/jira/browse/MADLIB-986
>
>
> Usability
>
> Expand coverage for PivotalR
> Expand coverage for PMML export
> Interface improvement and consistency
> Implement an interface using named parameters
> Python API
>
>
> Performance and scalability
>
> Work around PostgreSQL 1 GB field size limit
> https://issues.apache.org/jira/browse/MADLIB-991
>
>
> Platform
>
> Support for PostgreSQL 9.5
> https://issues.apache.org/jira/browse/MADLIB-944
>
>
> The next release v1.9.1 is at the end of June, and the JIRAs so far targeted
> for this release are
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.9.1%20ORDER%20BY%20summary%20ASC%2C%20key%20ASC%2C%20priority%20DESC
>
> I’d like to encourage us to get on a quarterly release cycle if at all
> possible.
>
> Regards,
> Frank
>
>
>
>
>
>
Re: Apache MADlib (incubating) candidate roadmap 2016
Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
Hi Frank!
first of all -- this is a pretty awesome effort! Thank you!
Quick question that is complimentary to the roadmap: what is the release
model that you're proposing? Is it purely feature driven or date-driven?
I kind of see both in this document.
Personally, I'm a big fan of date driven models (think Ubuntu) especially
for the incubating projects where they have to master the fine art of
an ASF release. Is date-driven model possible/desirable for MADlib?
Thanks,
Roman.
On Wed, Apr 13, 2016 at 5:28 PM, Frank McQuillan <fm...@pivotal.io> wrote:
> Hello MADlib users and developers,
>
> Now that the 1.9 release is out the door, I put up a candidate roadmap on
> the MADlib wiki
> https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
> that I would like to get the community’s input on. It is also copied below
> in this email. This roadmap is roughly for the remainder of 2016.
>
> Please feel free to share your thoughts on the proposal.
>
> It is based on the mailing lists, JIRA issues, pull requests, feedback from
> users, conferences ,etc.
>
>
> Predictive models
>
> Novelty detection using 1-class SVM
> https://issues.apache.org/jira/browse/MADLIB-990
> Mixed effects modeling https://issues.apache.org/jira/browse/MADLIB-987
> k-nearest neighbors (kNN) https://issues.apache.org/jira/browse/MADLIB-927
> MCMC Probit and Logit regression
> Factorization machines
> Gaussian Mixture Model using Expectation Maximization (EM) algorithm
> Multi-layer Perceptron
> Geographically Weighted Regression (GWR)
>
>
> Graph
>
> Shortest path https://issues.apache.org/jira/browse/MADLIB-992
> Standard traversal
> depth first search
> breadth first search
> topological sort
> One mode projection (converting a bi-partitite graph of user-item graph to
> user-user or item-item graph)
> Connected components
> Page rank
> Hierarchical graph cut
> Between-ness centrality
> Minimum spanning tree
>
>
> Utilities
>
> Path functions (phase 2) https://issues.apache.org/jira/browse/MADLIB-977
> Prediction metrics https://issues.apache.org/jira/browse/MADLIB-907
> Sessionization https://issues.apache.org/jira/browse/MADLIB-909
> Pivoting https://issues.apache.org/jira/browse/MADLIB-908
> Anonymization https://issues.apache.org/jira/browse/MADLIB-911
> URI tools https://issues.apache.org/jira/browse/MADLIB-910
> Stratified sampling https://issues.apache.org/jira/browse/MADLIB-986
>
>
> Usability
>
> Expand coverage for PivotalR
> Expand coverage for PMML export
> Interface improvement and consistency
> Implement an interface using named parameters
> Python API
>
>
> Performance and scalability
>
> Work around PostgreSQL 1 GB field size limit
> https://issues.apache.org/jira/browse/MADLIB-991
>
>
> Platform
>
> Support for PostgreSQL 9.5
> https://issues.apache.org/jira/browse/MADLIB-944
>
>
> The next release v1.9.1 is at the end of June, and the JIRAs so far targeted
> for this release are
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.9.1%20ORDER%20BY%20summary%20ASC%2C%20key%20ASC%2C%20priority%20DESC
>
> I’d like to encourage us to get on a quarterly release cycle if at all
> possible.
>
> Regards,
> Frank
>
>
>
>
>
>