You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gang Bai (JIRA)" <ji...@apache.org> on 2014/06/27 05:40:24 UTC

[jira] [Created] (SPARK-2303) Poisson regression model for count data

Gang Bai created SPARK-2303:
-------------------------------

             Summary: Poisson regression model for count data
                 Key: SPARK-2303
                 URL: https://issues.apache.org/jira/browse/SPARK-2303
             Project: Spark
          Issue Type: Bug
          Components: MLlib
            Reporter: Gang Bai


Modeling count data is of great importance in solving real-world statistic problems. Currently mllib.regression provides models mostly for numeric data, i.e fitting curves with various regularization on resulted weights, but still lacks the support of count data modeling.

A very basic model for this is the Poisson regression. Following the patterns in mllib and reusing the components, we address the parameter estimation for Poisson regression in a maximum likelihood manner. In detail, to add Poisson regression to mllib.regression, we need to:

 # Add the gradient of the negative log-likelihood into mllib/optimization/Gradients.scala.
 # Add the implementations of PoissonRegressionModel, which extends GeneralizedLinearModel with RegressionModel. Here we need the implementation of the predict method.
 # Add the implementations of the generalized linear algorithm class. Here we can use either LBFGS or GradientDescent as the optimizer. So we implement both as class PoissonRegressionWithSGD and class PoissonRegressionWithLBFGS respectively.
 # Add the companion object PoissonRegressionWithSGD and PoissonRegressionWithLBFGS as drivers.
 # Test suites
 ## Test the gradient computation.
 ## Test the regression method using generated data.
 ## Test the regression method using a real-world data set.
 # Add the documents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)