You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Makoto Yui (Jira)" <ji...@apache.org> on 2019/11/27 05:35:00 UTC

[jira] [Created] (HIVEMALL-284) Support class weighting in GeneralLearnerBase

Makoto Yui created HIVEMALL-284:
-----------------------------------

             Summary: Support class weighting in GeneralLearnerBase
                 Key: HIVEMALL-284
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-284
             Project: Hivemall
          Issue Type: New Feature
            Reporter: Makoto Yui
             Fix For: 0.7.0


[https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit]
[https://scikit-learn.org/dev/glossary.html#term-class-weight]

Introduce "-class_weight=[0.1,0.2]" or "-pos_weight=0.2 -neg_weight=0.1" option.

[https://github.com/scikit-learn/scikit-learn/blob/0a7adef0058ef28c7a146734f38161f7c7c581af/sklearn/linear_model/_sgd_fast.pyx#L719]

class_weight is computed in scikit as follows:
> class_weight_y = #samples / (#classes * count_of(y))

In SQL, it can be computed in SQL as follows: 
{code:java}
-- For binary classification (#classes = 2)
WITH weights as (
 select
  count(1) / 2 * sum(if(label=0, 1, 0) as neg_weight,
  count(1) / 2 * sum(if(label=1, 1, 0) as pos_weight
 from
  train
)
select
  train_classifier(features, label, concat('-pos_weight=', pos_weight, ' -neg_weight=", neg_weight)
from
  train l
   cross join weights r{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)