You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Kyoung Deok Kwon <kk...@gmail.com> on 2019/04/17 12:29:03 UTC

PFPGrowth fList maximum available capacity

Hello. mahout members!
First of all, please understand that I am not good at English.

I am going to use mahout PFPGrowth for my project.
As I understand it, parallel counting through Hadoop MapReduce in function
`startParallelCounting()` and then start grouping in function `readFList()`.

In function `readFList()` , fList is declared Lists.newArrayList().
There are expected to be hundreds of millions of fList sizes. Can I use
them?
Stack overflow is expected. Is it not designed in case size fList is large?
Am I right to understand?

Thanks in advance.

Re: PFPGrowth fList maximum available capacity

Posted by Andrew Musselman <ak...@apache.org>.
Hi Kyoung, I took a look at that package; I've never used it but there is
an example which uses it:

$ pwd
/home/akm/src/mahout/community/mahout-mr/mr-examples/src/main/java/org/apache/mahout/fpm/pfpgrowth
$ ls
dataset  DeliciousTagsExample.java

You may want to try that and see if the input data looks anything like what
you're planning to use it for in terms of list sizes, otherwise you could
try running it and seeing what happens.

Please let us know if you run into issues. It would be great to move that
package off map-reduce and into the newer framework; if you wanted to take
a look at that I'm sure we could give you some pointers along the way.

Best
Andrew



On Wed, Apr 17, 2019 at 5:29 AM Kyoung Deok Kwon <kk...@gmail.com> wrote:

> Hello. mahout members!
> First of all, please understand that I am not good at English.
>
> I am going to use mahout PFPGrowth for my project.
> As I understand it, parallel counting through Hadoop MapReduce in function
> `startParallelCounting()` and then start grouping in function
> `readFList()`.
>
> In function `readFList()` , fList is declared Lists.newArrayList().
> There are expected to be hundreds of millions of fList sizes. Can I use
> them?
> Stack overflow is expected. Is it not designed in case size fList is large?
> Am I right to understand?
>
> Thanks in advance.
>