You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Alexis Gillain <al...@googlemail.com> on 2015/08/20 11:00:26 UTC

MLlib Prefixspan implementation

I want to use prefixspan so I had a look at the code and the cited paper :
"Distributed PrefixSpan Algorithm Based on MapReduce".

There is a result in the paper I didn't really undertstand and I could'nt
find where it is used in the code.

Suppose a sequence database S = {1,2...n}, a sequence <a...> is a
length-(L-1) (2≤L≤n) sequential pattern, in projected databases which is a
prefix of a length-(L-1) sequential pattern <a...a>, when the support count
of <a> is not less than min_support, it is equal to obtaining a length-L
sequential pattern < a ... a > from projected databases that obtaining a
length-L sequential pattern < a ... a > from a sequence database S.

According to the paper It's supposed to add a pruning step in the reduce
function but I couldn't find where.

This result seems to come from a previous paper : "Wang Linlin, Fan Jun.
Improved Algorithm for Sequential Pattern Mining Based on PrefixSpan [J].
Computer Engineering, 2009, 35(23): 56-61" but it didn't help me to
understand it and how it can improve the algorithm.
-- 
Alexis GILLAIN