You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Maurizio (JIRA)" <ji...@apache.org> on 2008/06/28 04:15:45 UTC
[jira] Issue Comment Edited: (MAHOUT-9) Implement MapReduce
BayesianClassifier
[ https://issues.apache.org/jira/browse/MAHOUT-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608964#action_12608964 ]
maurizio316 edited comment on MAHOUT-9 at 6/27/08 7:14 PM:
--------------------------------------------------------
Hi Grant,
I'm developing something like your application and I found your code really interesting.
Probably I'm missing something, but I think that your bayesian approach doesn't work fine.
In the specific case, weightedFeatureProbability computes:
((weight * defaultProb) + (totalNumSeen * unweighted)) / (weight + totalNumSeen)
where unweighted=numSeen/labelCount
again, where
numSeen=# of time that feature has been seen within give label
and
labelCount=# of feature under label
If you observe the curve trend you realize that:
- terms never seen before are "heaver" than others.
- unweighted is a very small number , its contribution, in terms of probability, is insignificant. Moreover, numerator grow more slowly than denominator in case of widespread term.
What do you think about?
P.S.: sorry for my bad english
was (Author: maurizio316):
Hi Grant,
I'm developing something like your application and I found your code really interesting.
Probably I'm missing something, but I think that your bayesian approach doesn't work fine.
In the specific case, weightedFeatureProbability computes:
((weight * defaultProb) + (totalNumSeen * unweighted)) / (weight + totalNumSeen)
where unweighted=numSeen/labelCount
again, where
numSeen=# of time that feature has been seen within give label
and
labelCount=# of feature under label
If you observe the curve trend you realize that:
- terms never seen before are "heaver" than others.
- unweighted is a very small number , its contribution, in terms of probability, is insignificant. Moreover, numerator grow more slowly than denominator in case of widespread term.
What do you think about?
> Implement MapReduce BayesianClassifier
> --------------------------------------
>
> Key: MAHOUT-9
> URL: https://issues.apache.org/jira/browse/MAHOUT-9
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 0.1
>
> Attachments: MAHOUT-9.patch, MAHOUT-9.patch, MAHOUT-9.patch, MAHOUT-9.patch, MAHOUT-9.patch
>
>
> Implement a Bayesian classifier using M/R.
> I have a simple trainer done (not M/R) and will implement the classifier soon, then will upgrade it to use Hadoop.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.