You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2013/11/13 19:13:22 UTC
[jira] [Commented] (MAHOUT-1355) Frequent Pattern Mining algorithms for Mahout

    [ https://issues.apache.org/jira/browse/MAHOUT-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821621#comment-13821621 ] 

Suneel Marthi commented on MAHOUT-1355:
---------------------------------------

[~smoens] Thanks for this patch. Some comments based on a very cursory firs pass through the code and not considering the actual algorithm and its implementation.

a)  Use Guava APIs where appropriate.

      For eg:-  Map<Integer,MutableLong> counts = new HashMap<Integer,MutableLong>();

         could be replaced by
           
                   Map<Integer,MutableLong> counts = Maps.newHashMap();

b)  Classes that actually launch MR jobs should extend Mahout's AbstractJob and leverage appropriate methods.

   For eg:-

    public class BigFIMDriver extends Configured implements Tool
  
   can be replaced by

   public class BigFIMDriver extends AbstractJob

  
    Replace 

          Job job = new Job(conf, "Apriori Phase" + i);

         /// and all of the code that follows this

    by 

        Job bigFmJob = prepareJob(.....);


Could u post this patch on Reviewboard, it would be much easier to comment and review then.

https://reviews.apache.org












> Frequent Pattern Mining algorithms for Mahout
> ---------------------------------------------
>
>                 Key: MAHOUT-1355
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1355
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.9
>            Reporter: Sandy Moens
>            Priority: Minor
>         Attachments: MAHOUT-1355.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> We implemented frequent pattern mining algorithms for Hadoop and adapted them to Mahout. We used "PFP" (now deprecated) as a benchmark and these implementations perform better in terms of speed and memory footprint. The details of the implementations can be found in the paper Frequent Pattern Mining for BigData ( http://adrem.ua.ac.be/bigfim )
> We have been maintaining the project for a while in GitLab ( https://gitlab.com/adrem/bigfim ). Documentation for adaptation ( Readme-Mahout.md ) and usage in mahout ( Mahout-wiki.md ) can be found there.
> We are open to any modification and/or improvement requests to make it more worthwhile for the Mahout project. We, as the research group, volunteer to maintain FPM algorithms as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)