You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by "FENG, Xixuan (Aaron)" <xi...@gmail.com> on 2019/10/08 03:41:11 UTC

[REPORT] MADlib - October 2019

## Description:

- Apache MADlib is a scalable, big data, SQL-driven machine learning
framework
  for data scientists.


## Issues:

- There are no issues requiring board attention at this time.


## Activity:

- Community is at work on the 1.17 release, which will be the 7th release as
  an Apache TLP project. Main JIRAs include:
* feature improvements for deep learning including training multiple models
in
  parallel for parameter selection (hyper-parameter tuning and model
  architecture search), inference on models trained outside of MADlib, and
  performance improvements to mini-batch preprocessor
* performance improvements to correlation/covariance, association rules, and
  weakly connected components graph algorithm
* stopping criteria on LDA using perplexity
* auto selection of number of centroids for K-mean clustering

- After that will be the 2.0 release with JIRAs related to versioning
models.

— Nikhil Kak and Nandish Jayaram (MADlib committers and PMC members)
presented
  a community call on 2019-Aug-1 on the MADlib 1.16 release features:
  https://www.youtube.com/watch?v=uLW5By66Lf0

- Yuhao Zhang, a PhD candidate at University of California, San Diego
  completed his internship at Pivotal in Palo Alto on parameter selection in
  MADlib, which is an important area for deep learning practitioners.
Yuhao's
  advisor at UCSD is Arun Kumar in the Department of Computer Science and
  Engineering, whose research has contributed to MADlib in the past.  A
  presentation by Yuhao on his work on MADlib is at:
  https://www.youtube.com/watch?v=aZlKXqhyRKY

## Health report:

The community is relatively small but very engaged with robust mailing list
traffic, interest in doing frequent releases and new functionality being
developed by contributors.

The number of developers actively contributing to the code/documentation is
approximately 7 in the 3rd quarter of calendar year 2019.

We will constantly be on a lookout for new community members to be invited
either as committers or PMC.


## PMC changes:

- No changes in the last quarter.  Currently stands at 14 PMC members.


## Committer base changes:

- Currently 17 committers.

- New committers added since last report: Ekta Khanna (2018-07-27) Himanshu
  Pandey (2018-07-27) Domino Valdano (2018-07-27)


## Releases:

- Next release: v1.17 planned for 4Q2019

- v1.16.0 released on 2019-07-08

- v1.15.1 released on 2018-10-15

- v1.15.0 released on 2018-08-10


## Mailing list activity:

Average monthly mailing list activity was 503 posts to dev@ and 11 posts to
user@ for the last 3 months Jul-Sep 2019.


## JIRA Statistics:

- 3 JIRA tickets created in the last month

- 3 JIRA tickets resolved in the last month