You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Makoto Yui (Jira)" <ji...@apache.org> on 2019/11/27 19:09:00 UTC
[jira] [Comment Edited] (HIVEMALL-280) Support
lift/confidence/support UDF
[ https://issues.apache.org/jira/browse/HIVEMALL-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983834#comment-16983834 ]
Makoto Yui edited comment on HIVEMALL-280 at 11/27/19 7:08 PM:
---------------------------------------------------------------
{noformat}
supp(x=>y) = |xUy| / tx_total
conf(x=>y) = supp(x=>y)/supp(x) = |xUy| / |x|
lift(x=>y) = conf(x=>y)/supp(y) = (|xUy| / |x|) / (|y| / tx_total)
"A, B, A_count, B_count, AB_count, tx_total"
is required to compute support/confidense/lift for each A-B pair.
SELECT
A,
B,
A_B_count / tx_total as support,
A_B_count / A_count AS confidence,
(A_B_count / A_count) / (B_count / tx_total) AS lift
FROM
transaction
ORDER BY
1, 2;
Another approach is taking the following table as the input.
Transaction table often takes the following form [1]
[1] https://www.philippe-fournier-viger.com/spmf/Apriori.php
txid items
t1 {1, 3, 4}
t2 {2, 3, 5}
t3 {1, 2, 3, 5}
t4 {2, 5}
t5 {1, 2, 3, 5}
WITH tmp1 as (
select
arules(txid, items, '-min_supp 2 -topk 100 -order_by lift') as rules
from
transaction
),
tmp2 as (
select
explode(rules) rule
from
tmp1
)
select
rule.item,
rule.other,
rule.supp,
rule.conf,
rule.lift
from
tmp2;
explode items internally in UDAF process.
a b
t1 {1, 3},
t1 {1, 4}
t1 {3, 4}
t2 {2, 3}
t2 {2, 5}
t2 {3, 5}
....
1. compute tx_total by aggregation
long
2. compute item_count (a_count,b_count) by aggregation
map<item, long>
3. compute ab_count by aggregation
map<pair<item,item>, long>
4. emit (need to limit returning list size by supp/conf/lift value)
ordered_list<struct<item,item,supp,conf,lift>>
{noformat}
was (Author: myui):
supp(x=>y) = |xUy| / tx_total
conf(x=>y) = supp(x=>y)/supp(x) = |xUy| / |x|
lift(x=>y) = conf(x=>y)/supp(y) = (|xUy| / |x|) / (|y| / tx_total)
"A, B, A_count, B_count, AB_count, tx_total"
is required to compute support/confidense/lift for each A-B pair.
SELECT
A,
B,
A_B_count / tx_total as support,
A_B_count / A_count AS confidence,
(A_B_count / A_count) / (B_count / tx_total) AS lift
FROM
transaction
ORDER BY
1, 2;
Another approach is taking the following table as the input.
Transaction table often takes the following form [1]
[1] https://www.philippe-fournier-viger.com/spmf/Apriori.php
txid items
t1 {1, 3, 4}
t2 {2, 3, 5}
t3 {1, 2, 3, 5}
t4 {2, 5}
t5 {1, 2, 3, 5}
WITH tmp1 as (
select
arules(txid, items, '-min_supp 2 -topk 100 -order_by lift') as rules
from
transaction
),
tmp2 as (
select
explode(rules) rule
from
tmp1
)
select
rule.item,
rule.other,
rule.supp,
rule.conf,
rule.lift
from
tmp2;
explode items internally in UDAF process.
a b
t1 {1, 3},
t1 {1, 4}
t1 {3, 4}
t2 {2, 3}
t2 {2, 5}
t2 {3, 5}
....
1. compute tx_total by aggregation
long
2. compute item_count (a_count,b_count) by aggregation
map<item, long>
3. compute ab_count by aggregation
map<pair<item,item>, long>
4. emit (need to limit returning list size by supp/conf/lift value)
ordered_list<struct<item,item,supp,conf,lift>>
> Support lift/confidence/support UDF
> -----------------------------------
>
> Key: HIVEMALL-280
> URL: https://issues.apache.org/jira/browse/HIVEMALL-280
> Project: Hivemall
> Issue Type: New Feature
> Reporter: Makoto Yui
> Assignee: Makoto Yui
> Priority: Minor
>
> Support lift/confidence/support UDAF
> [https://en.wikipedia.org/wiki/Lift_(data_mining])
> [https://towardsdatascience.com/a-gentle-introduction-on-market-basket-analysis-association-rules-fa4b986a40ce]
> [https://medium.com/@samratjain/explained-market-basket-analysis-using-sql-a7434f30e649]
> {code:java}
> select
> item, other_item,
> lift(...) as lift,
> confidence (....) as confidence
> from
> transaction
> group by
> 1, 2{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)