You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Vasil Vasilev <va...@gmail.com> on 2011/03/31 18:36:38 UTC
2 bugs in seq2sparse
Hi all,
I was recently experimenting with seq2sparse and I found 2 problems with it
1. the minLLR parameter is not taken into account. The problem is that in
the CollocDriver class
Job job = new Job(conf);
is executed before
conf.setFloat(LLRReducer.MIN_LLR, minLLRValue);
see CollocDriver.computeNGramsPruneByLLR method
2. maxDFPercent is not taken into account. The problem is that in
TFIDFPartialVectorReducer.reduce the check is
if (df / vectorCount > maxDfPercent) {
if (log.isInfoEnabled()) {
log.info("ommiting {}", e.index());
}
continue;
}
and should be:
if (df*100 / vectorCount > maxDfPercent) {
if (log.isInfoEnabled()) {
log.info("ommiting {}", e.index());
}
continue;
}
Shall I file Jiras for these issues? I can also apply patch
Regards, Vasil
Re: 2 bugs in seq2sparse
Posted by Sean Owen <sr...@gmail.com>.
It's so small, I'll just file the JIRA and resolve it.
On Thu, Mar 31, 2011 at 6:36 PM, Ted Dunning <te...@gmail.com> wrote:
> Please do.
>
> Can you build tests that demonstrate the problem as part of your patches?
>
> On Thu, Mar 31, 2011 at 9:36 AM, Vasil Vasilev <va...@gmail.com>
> wrote:
>
> > Shall I file Jiras for these issues? I can also apply patch
> >
>
Re: 2 bugs in seq2sparse
Posted by Ted Dunning <te...@gmail.com>.
Please do.
Can you build tests that demonstrate the problem as part of your patches?
On Thu, Mar 31, 2011 at 9:36 AM, Vasil Vasilev <va...@gmail.com> wrote:
> Shall I file Jiras for these issues? I can also apply patch
>