You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Joel Bernstein (JIRA)" <ji...@apache.org> on 2018/04/06 15:53:00 UTC

[jira] [Created] (SOLR-12197) Implement sampling for logistic regression classifier

Joel Bernstein created SOLR-12197:
-------------------------------------

             Summary: Implement sampling for logistic regression classifier
                 Key: SOLR-12197
                 URL: https://issues.apache.org/jira/browse/SOLR-12197
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Joel Bernstein


Currently the *train* function trains a logistic regression model by iterating over the entire distributed training set on each pass. Each iteration involves building a matrix on each shard with the number of rows being the size of the training set contained on the shard. The number of columns will be the number of features. This scenario can create very large matrices when working with large training sets and feature sets.

This ticket will add a *sample* parameter which will limit the size of the training set on each iteration to a random sample of the training set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org