You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Makoto Yui (Jira)" <ji...@apache.org> on 2019/11/27 19:09:00 UTC

[jira] [Comment Edited] (HIVEMALL-280) Support lift/confidence/support UDF

    [ https://issues.apache.org/jira/browse/HIVEMALL-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983834#comment-16983834 ] 

Makoto Yui edited comment on HIVEMALL-280 at 11/27/19 7:08 PM:
---------------------------------------------------------------


{noformat}
supp(x=>y) = |xUy| / tx_total
conf(x=>y) = supp(x=>y)/supp(x) = |xUy| / |x|
lift(x=>y) = conf(x=>y)/supp(y) = (|xUy| / |x|) / (|y| / tx_total)

"A, B, A_count, B_count, AB_count, tx_total"
is required to compute support/confidense/lift for each A-B pair.

SELECT
  A,
  B,
  A_B_count / tx_total as support,
  A_B_count / A_count AS confidence,
  (A_B_count / A_count) / (B_count / tx_total) AS lift 
FROM
  transaction 
ORDER BY
  1, 2;

Another approach is taking the following table as the input.
Transaction table often takes the following form [1]
[1] https://www.philippe-fournier-viger.com/spmf/Apriori.php

txid	items
t1	{1, 3, 4}
t2	{2, 3, 5}
t3	{1, 2, 3, 5}
t4	{2, 5}
t5	{1, 2, 3, 5}

WITH tmp1 as (
 select
   arules(txid, items, '-min_supp 2 -topk 100 -order_by lift') as rules
 from
   transaction
),
tmp2 as (
 select
   explode(rules) rule
 from 
  tmp1
)
select
  rule.item,
  rule.other,
  rule.supp,
  rule.conf,
  rule.lift
from
  tmp2;

explode items internally in UDAF process.
        a  b
    t1 {1, 3}, 
	t1 {1, 4}
	t1 {3, 4}
	t2 {2, 3}
	t2 {2, 5}
	t2 {3, 5}
	....

1. compute tx_total by aggregation 
     long
2. compute item_count (a_count,b_count) by aggregation
     map<item, long>
3. compute ab_count by aggregation 
     map<pair<item,item>, long> 
4. emit (need to limit returning list size by supp/conf/lift value) 
	 ordered_list<struct<item,item,supp,conf,lift>>
{noformat}



was (Author: myui):
supp(x=>y) = |xUy| / tx_total
conf(x=>y) = supp(x=>y)/supp(x) = |xUy| / |x|
lift(x=>y) = conf(x=>y)/supp(y) = (|xUy| / |x|) / (|y| / tx_total)

"A, B, A_count, B_count, AB_count, tx_total"
is required to compute support/confidense/lift for each A-B pair.

SELECT
  A,
  B,
  A_B_count / tx_total as support,
  A_B_count / A_count AS confidence,
  (A_B_count / A_count) / (B_count / tx_total) AS lift 
FROM
  transaction 
ORDER BY
  1, 2;

Another approach is taking the following table as the input.
Transaction table often takes the following form [1]
[1] https://www.philippe-fournier-viger.com/spmf/Apriori.php

txid	items
t1	{1, 3, 4}
t2	{2, 3, 5}
t3	{1, 2, 3, 5}
t4	{2, 5}
t5	{1, 2, 3, 5}

WITH tmp1 as (
 select
   arules(txid, items, '-min_supp 2 -topk 100 -order_by lift') as rules
 from
   transaction
),
tmp2 as (
 select
   explode(rules) rule
 from 
  tmp1
)
select
  rule.item,
  rule.other,
  rule.supp,
  rule.conf,
  rule.lift
from
  tmp2;

explode items internally in UDAF process.
        a  b
    t1 {1, 3}, 
	t1 {1, 4}
	t1 {3, 4}
	t2 {2, 3}
	t2 {2, 5}
	t2 {3, 5}
	....

1. compute tx_total by aggregation 
     long
2. compute item_count (a_count,b_count) by aggregation
     map<item, long>
3. compute ab_count by aggregation 
     map<pair<item,item>, long> 
4. emit (need to limit returning list size by supp/conf/lift value) 
	 ordered_list<struct<item,item,supp,conf,lift>>

> Support lift/confidence/support UDF
> -----------------------------------
>
>                 Key: HIVEMALL-280
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-280
>             Project: Hivemall
>          Issue Type: New Feature
>            Reporter: Makoto Yui
>            Assignee: Makoto Yui
>            Priority: Minor
>
> Support lift/confidence/support UDAF
> [https://en.wikipedia.org/wiki/Lift_(data_mining])
> [https://towardsdatascience.com/a-gentle-introduction-on-market-basket-analysis-association-rules-fa4b986a40ce]
> [https://medium.com/@samratjain/explained-market-basket-analysis-using-sql-a7434f30e649]
> {code:java}
> select
>   item, other_item,
>   lift(...) as lift,
>   confidence (....) as confidence
> from
>   transaction
> group by
>   1, 2{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)