You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Ungerer, Jens" <je...@student.kit.edu> on 2012/05/29 15:49:51 UTC
mahout FPGrowth problem
I am using mahout-distribution 0.6. My first test programm of mahout FPGrowth with a small data set
worked well (example1.txt).
In my second test programm I get this exception.
"Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5
at org.apache.mahout.fpm.pfpgrowth.convertors.TransactionIterator$
1.apply(TransactionIterator.java:48)
at org.apache.mahout.fpm.pfpgrowth.convertors.TransactionIterator$1.apply(TransactionIterator.java:42)
at com.google.common.collect.Iterators$8.next(Iterators.java:765)
at com.google.common.collect.ForwardingIterator.next(ForwardingIterator.java:48)
at org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth.generateTopKFrequentPatterns(FPGrowth.java:290)
at org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth.generateTopKFrequentPatterns(FPGrowth.java:174)
at fpgrowth.Fpgrowth.frequentPatternMining(Fpgrowth.java:82)
at Main.main(Main.java:119)"
Then I added items to the itemsets in that way that each transaction has the same length.
This also worked well. (exmaple2.txt).
I have two questions.
Is it neccessary to use itemsets with equal length?
(In my first data set I didn't use itemsets with equal length... )
Is it possible to use itemsets with duplicates in mahout FPGrowth?
https://cwiki.apache.org/confluence/display/MAHOUT/Mailing+Lists,+IRC+and+Archives
"Also, please send questions to this list to verify your problem before filing issues in JIRA."
-> I don't think so, but is my problem perhaps a bug in mahout FPGrowth?
best regards
Jens
AW: mahout FPGrowth problem
Posted by "Ungerer, Jens" <je...@student.kit.edu>.
Hi,
thank you for your response.
I removed the multiple items and know I don't get an exception.
>> Is it neccessary to use itemsets with equal length?
>No - fixed size itemsets are not required.
>> Is it possible to use itemsets with duplicates in mahout FPGrowth?
>Not reliably. This crash looks like it caused by having more items in
>one particular itemset than in the set of items with at least
>min-support. Perhaps more simply, if you have three items that meet
>your support cutoff, and you encounter an itemset like: "item1 item1
>tem2 item3", this will happen.
>It won't crash here in every case where you have multiples of items; for
>example "item1 item1 item2" will not crash. I'm not sure precisely how
>that will be treated down the line, but my assumption is that your
>results would be subtly wrong somehow.
>It would be pretty straightforward to fix this, and either tolerate
>multiples gracefully by collapsing "baskets" into itemsets or bailing
>out with an error. I think we'd just need to fix TransactionIterator
>and also check the initial counting pass (ParallelCountingDriver and its
>non-MR counterpart).
>-tom
regards
Jens
Re: mahout FPGrowth problem
Posted by tom pierce <tc...@apache.org>.
Hi Jens,
> Is it neccessary to use itemsets with equal length?
No - fixed size itemsets are not required.
> Is it possible to use itemsets with duplicates in mahout FPGrowth?
Not reliably. This crash looks like it caused by having more items in
one particular itemset than in the set of items with at least
min-support. Perhaps more simply, if you have three items that meet
your support cutoff, and you encounter an itemset like: "item1 item1
item2 item3", this will happen.
It won't crash here in every case where you have multiples of items; for
example "item1 item1 item2" will not crash. I'm not sure precisely how
that will be treated down the line, but my assumption is that your
results would be subtly wrong somehow.
It would be pretty straightforward to fix this, and either tolerate
multiples gracefully by collapsing "baskets" into itemsets or bailing
out with an error. I think we'd just need to fix TransactionIterator
and also check the initial counting pass (ParallelCountingDriver and its
non-MR counterpart).
-tom
On 05/29/2012 09:49 AM, Ungerer, Jens wrote:
> I am using mahout-distribution 0.6. My first test programm of mahout FPGrowth with a small data set
> worked well (example1.txt).
>
> In my second test programm I get this exception.
>
> "Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5
> at org.apache.mahout.fpm.pfpgrowth.convertors.TransactionIterator$
> 1.apply(TransactionIterator.java:48)
> at org.apache.mahout.fpm.pfpgrowth.convertors.TransactionIterator$1.apply(TransactionIterator.java:42)
> at com.google.common.collect.Iterators$8.next(Iterators.java:765)
> at com.google.common.collect.ForwardingIterator.next(ForwardingIterator.java:48)
> at org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth.generateTopKFrequentPatterns(FPGrowth.java:290)
> at org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth.generateTopKFrequentPatterns(FPGrowth.java:174)
> at fpgrowth.Fpgrowth.frequentPatternMining(Fpgrowth.java:82)
> at Main.main(Main.java:119)"
>
>
> Then I added items to the itemsets in that way that each transaction has the same length.
> This also worked well. (exmaple2.txt).
>
> I have two questions.
>
> Is it neccessary to use itemsets with equal length?
> (In my first data set I didn't use itemsets with equal length... )
>
> Is it possible to use itemsets with duplicates in mahout FPGrowth?
>
>
> https://cwiki.apache.org/confluence/display/MAHOUT/Mailing+Lists,+IRC+and+Archives
> "Also, please send questions to this list to verify your problem before filing issues in JIRA."
> -> I don't think so, but is my problem perhaps a bug in mahout FPGrowth?
>
>
> best regards
> Jens
>