You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Joel Bernstein (JIRA)" <ji...@apache.org> on 2016/08/04 16:58:20 UTC

[jira] [Created] (SOLR-9384) Add randomization to the train Streaming Expression to support very large training sets

Joel Bernstein created SOLR-9384:
------------------------------------

             Summary: Add randomization to the train Streaming Expression to support very large training sets
                 Key: SOLR-9384
                 URL: https://issues.apache.org/jira/browse/SOLR-9384
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Joel Bernstein


The *train* Streaming Expression optimizes a logistic regression model on text.

The initial implementation instantiates a doc vector for each document in the training set on each iteration. The doc vectors are held in memory so, the size of the training set is limited by memory constraints.

This ticket will add randomization to the algorithm so that a random set of documents from the training set are processed on each iteration. 

This will allow the train Streaming Expression to be run on much larger training sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org