You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Arseniy Tashoyan (JIRA)" <ji...@apache.org> on 2018/01/30 10:23:00 UTC

[jira] [Updated] (SPARK-23269) FP-growth: Provide last transaction for each detected frequent pattern

     [ https://issues.apache.org/jira/browse/SPARK-23269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arseniy Tashoyan updated SPARK-23269:
-------------------------------------
    Description: 
FP-growth implementation gives patterns and their frequences:

_model.freqItemsets_:
||items||freq||
|[5]|3|
|[5, 1]|3|

It would be great to know when each pattern occurred last time - what is the last transaction having this pattern?

To do so, it will be necessary to tell FPGrowth what is the timestamp column in the transactions data frame:
{code:java}
val fpgrowth = new FPGrowth()
  .setItemsCol("items")
  .setTimestampCol("timestamp")
{code}
So the data frame with patterns could look like:
||items||freq||lastOccurrence||
|[5]|3|2018-01-01 12:15:00|
|[5, 1]|3|2018-01-01 12:15:00|

Without this functionality, it is necessary to traverse the transactions data frame with the set of detected patterns and determine the last transaction for each pattern. Why traverse transactions once again if it has been already done in FP-growth execution?

 

  was:
FP-growth implementation gives patterns and their frequences:

_model.freqItemsets_:
||items||freq||
|[5]|3|
|[5, 1]|3|

It would be great to know when each pattern occurred last time - what it the last transaction having this pattern.

To do so, it will be necessary to tell FPGrowth what is the timestamp column in the transactions data frame:
{code:java}
val fpgrowth = new FPGrowth()
  .setItemsCol("items")
  .setTimestampCol("timestamp")
{code}
So the data frame with patterns could look like:
||items||freq||lastOccurrence||
|[5]|3|2018-01-01 12:15:00|
|[5, 1]|3|2018-01-01 12:15:00|

Without this functionality, it is necessary to traverse the transactions data frame with the set of detected patterns and determine the last transaction for each pattern. Why traverse transactions once again if it has been already done in FP-growth execution?

 


> FP-growth: Provide last transaction for each detected frequent pattern
> ----------------------------------------------------------------------
>
>                 Key: SPARK-23269
>                 URL: https://issues.apache.org/jira/browse/SPARK-23269
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.1
>            Reporter: Arseniy Tashoyan
>            Priority: Major
>              Labels: MLlib, fp-growth
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> FP-growth implementation gives patterns and their frequences:
> _model.freqItemsets_:
> ||items||freq||
> |[5]|3|
> |[5, 1]|3|
> It would be great to know when each pattern occurred last time - what is the last transaction having this pattern?
> To do so, it will be necessary to tell FPGrowth what is the timestamp column in the transactions data frame:
> {code:java}
> val fpgrowth = new FPGrowth()
>   .setItemsCol("items")
>   .setTimestampCol("timestamp")
> {code}
> So the data frame with patterns could look like:
> ||items||freq||lastOccurrence||
> |[5]|3|2018-01-01 12:15:00|
> |[5, 1]|3|2018-01-01 12:15:00|
> Without this functionality, it is necessary to traverse the transactions data frame with the set of detected patterns and determine the last transaction for each pattern. Why traverse transactions once again if it has been already done in FP-growth execution?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org