You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Vinh Khuc (JIRA)" <ji...@apache.org> on 2014/04/23 05:39:14 UTC

[jira] [Commented] (OPENNLP-671) Add L1-regularization into L-BFGS

    [ https://issues.apache.org/jira/browse/OPENNLP-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977796#comment-13977796 ] 

Vinh Khuc commented on OPENNLP-671:
-----------------------------------

Attached is the patch for L1-LBFGS. The implementation of ElasticNet (i.e. L1 and L2 combined) is also included. During L-BFGS training:
if L1Cost > 0, L2Cost = 0, L1-regularization will be used,
if L1Cost = 0, L2Cost > 0, L2-regularization will be used,
and ElasticNet will be used if both costs are set to be > 0.

As shown in the attached log files, L1-regularization gives very good accuracy for the NL-PER data set. Moreover, the trained model is much smaller than the one trained with L2-regularization. L1 works well for NL-PER since the number of features/contexts is much larger than the number training instances.

However, when the number of training instances is larger than the number of features, L2 tends to work better than L1. ElasticNet is added to solve this problem by combining the advantages of L1 and L2.

I also moved LBFGS-based convex optimization solver into the QNMinimizer class so that it can be used for other purposes. Usage example is described in its class description.

Finally, I did some code cleanup to make the source code easier to maintain.

> Add L1-regularization into L-BFGS
> ---------------------------------
>
>                 Key: OPENNLP-671
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-671
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Machine Learning
>            Reporter: Vinh Khuc
>         Attachments: L1-ElasticNet-LBFGS.patch, nl-per-testa-l1.log, nl-per-testb-l1.log, nl-per-train-l1.log, qn-trainer-l1.params
>
>
> L1-regularization is useful during training Maximum Entropy models since it pushes parameters of irrelevant features to zero. Hence, the parameter vector will be sparse and the trained model will be compact. 
> When the number of features is much larger than the number of training examples, L1 often gives better accuracy than L2.
> The implementation of L1-regularization for L-BFGS will follow the method described in the paper:
> http://research.microsoft.com/en-us/um/people/jfgao/paper/icml07scalable.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)