You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Devi P.V" <de...@gmail.com> on 2017/02/02 07:17:22 UTC

FP growth - Items in a transaction must be unique

Hi all,

I am trying to run FP growth algorithm using spark and scala.sample input
dataframe is following,

+-------------------------------------------------------------------------------------------+
|productName

+-------------------------------------------------------------------------------------------+
|Apple Iphone 7 128GB Jet Black with
Facetime
|Levi’s Blue Slim Fit Jeans- L5112,Rimmel London Lasting Finish Matte by
Kate Moss 101 Dusky|
|Iphone 6 Plus (5.5",Limited Stocks, TRA Oman
Approved)
+-------------------------------------------------------------------------------------------+

Each row contains unique items.

I converted it into rdd like following

val transactions = names.as[String].rdd.map(s =>s.split(","))

val fpg = new FPGrowth().
  setMinSupport(0.3).
  setNumPartitions(100)


val model = fpg.run(transactions)

But I got error

WARN TaskSetManager: Lost task 2.0 in stage 27.0 (TID 622, localhost):
org.apache.spark.SparkException:
Items in a transaction must be unique but got WrappedArray(
Huawei GR3 Dual Sim 16GB 13MP 5Inch 4G,
 Huawei G8 Gold 32GB,  4G,
5.5 Inches, HTC Desire 816 (Dual Sim, 3G, 8GB),
 Samsung Galaxy S7 Single Sim - 32GB,  4G LTE,
Gold, Huawei P8 Lite 16GB,  4G LTE, Huawei Y625,
Samsung Galaxy Note 5 - 32GB,  4G LTE,
Samsung Galaxy S7 Dual Sim - 32GB)


How to solve this?


Thanks

Re: FP growth - Items in a transaction must be unique

Posted by Patrick Plaatje <pa...@bazana.com>.
Hi,

 

This indicates you have duplicate products per row in your dataframe, the FP implementation only allows unique products per row, so you will need to dedupe duplicate products before running the FPGrowth algorithm.

 

Best,

Patrick

 

From: "Devi P.V" <de...@gmail.com>
Date: Thursday, 2 February 2017 at 07:17
To: "user @spark" <us...@spark.apache.org>
Subject: FP growth - Items in a transaction must be unique

 

Hi all,

I am trying to run FP growth algorithm using spark and scala.sample input dataframe is following,

+-------------------------------------------------------------------------------------------+
|productName                                                                                
+-------------------------------------------------------------------------------------------+
|Apple Iphone 7 128GB Jet Black with Facetime                                               
|Levi’s Blue Slim Fit Jeans- L5112,Rimmel London Lasting Finish Matte by Kate Moss 101 Dusky|
|Iphone 6 Plus (5.5",Limited Stocks, TRA Oman Approved)                                     
+-------------------------------------------------------------------------------------------+

Each row contains unique items.

 

I converted it into rdd like following
val transactions = names.as[String].rdd.map(s =>s.split(","))

val fpg = new FPGrowth().
  setMinSupport(0.3).
  setNumPartitions(100)


val model = fpg.run(transactions)
But I got error

WARN TaskSetManager: Lost task 2.0 in stage 27.0 (TID 622, localhost):
org.apache.spark.SparkException: 
Items in a transaction must be unique but got WrappedArray(
Huawei GR3 Dual Sim 16GB 13MP 5Inch 4G,
 Huawei G8 Gold 32GB,  4G,  
5.5 Inches, HTC Desire 816 (Dual Sim, 3G, 8GB),
 Samsung Galaxy S7 Single Sim - 32GB,  4G LTE,  
Gold, Huawei P8 Lite 16GB,  4G LTE, Huawei Y625, 
Samsung Galaxy Note 5 - 32GB,  4G LTE, 
Samsung Galaxy S7 Dual Sim - 32GB)

How to solve this?

Thanks