You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2013/11/13 19:13:22 UTC
[jira] [Commented] (MAHOUT-1355) Frequent Pattern Mining algorithms
for Mahout
[ https://issues.apache.org/jira/browse/MAHOUT-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821621#comment-13821621 ]
Suneel Marthi commented on MAHOUT-1355:
---------------------------------------
[~smoens] Thanks for this patch. Some comments based on a very cursory firs pass through the code and not considering the actual algorithm and its implementation.
a) Use Guava APIs where appropriate.
For eg:- Map<Integer,MutableLong> counts = new HashMap<Integer,MutableLong>();
could be replaced by
Map<Integer,MutableLong> counts = Maps.newHashMap();
b) Classes that actually launch MR jobs should extend Mahout's AbstractJob and leverage appropriate methods.
For eg:-
public class BigFIMDriver extends Configured implements Tool
can be replaced by
public class BigFIMDriver extends AbstractJob
Replace
Job job = new Job(conf, "Apriori Phase" + i);
/// and all of the code that follows this
by
Job bigFmJob = prepareJob(.....);
Could u post this patch on Reviewboard, it would be much easier to comment and review then.
https://reviews.apache.org
> Frequent Pattern Mining algorithms for Mahout
> ---------------------------------------------
>
> Key: MAHOUT-1355
> URL: https://issues.apache.org/jira/browse/MAHOUT-1355
> Project: Mahout
> Issue Type: New Feature
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.9
> Reporter: Sandy Moens
> Priority: Minor
> Attachments: MAHOUT-1355.patch
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> We implemented frequent pattern mining algorithms for Hadoop and adapted them to Mahout. We used "PFP" (now deprecated) as a benchmark and these implementations perform better in terms of speed and memory footprint. The details of the implementations can be found in the paper Frequent Pattern Mining for BigData ( http://adrem.ua.ac.be/bigfim )
> We have been maintaining the project for a while in GitLab ( https://gitlab.com/adrem/bigfim ). Documentation for adaptation ( Readme-Mahout.md ) and usage in mahout ( Mahout-wiki.md ) can be found there.
> We are open to any modification and/or improvement requests to make it more worthwhile for the Mahout project. We, as the research group, volunteer to maintain FPM algorithms as well.
--
This message was sent by Atlassian JIRA
(v6.1#6144)