You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Takuya Kitazawa (JIRA)" <ji...@apache.org> on 2017/05/31 09:18:04 UTC

[jira] [Commented] (HIVEMALL-19) Support DIMSUM for approx all-pairs similarity computation

    [ https://issues.apache.org/jira/browse/HIVEMALL-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030877#comment-16030877 ] 

Takuya Kitazawa commented on HIVEMALL-19:
-----------------------------------------

{code:title=DIMSUM|theme=FadeToGrey|linenumbers=true|language=sql|firstline=0001|collapse=true}
WITH c_j as (
  select
    to_map(j,l2norm) as l2norm
  from (
    select 
      user as j, 
      l2_norm(ln(purchase_count+1))) as l2norm -- UDAF
    from 
      user_purchased
    group by
      user
    ) t0
),
t1 as (
  select
    item as i,
    collect_list(
      feature(userid, ln(purchase_count+1))
    ) as ri -- array(u1:0.1,u2:1.1)
  from
    user_purchased
  group by
    item
),
t2 as (
  select
    dimsum_mapper(r_i, c_j.l2norm) -- UDTF
    -- dimsum_mapper(r_i, map_get_values_as_list(c_j.l2norm, extract_features_as_list(r_i)) 
      as (j, k, v_jk)
  from
    t1
    CROSS JOIN c_j
)
select
  j, k,
  sum(v_jk) as similarity
from 
  t2
group by
  j, k
{code}

> Support DIMSUM for approx all-pairs similarity computation
> ----------------------------------------------------------
>
>                 Key: HIVEMALL-19
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-19
>             Project: Hivemall
>          Issue Type: Sub-task
>            Reporter: Makoto Yui
>            Assignee: Takuya Kitazawa
>            Priority: Minor
>
> Support DIMSUM for approx all-pairs similarity computation.
> https://blog.twitter.com/2014/all-pairs-similarity-via-dimsum
> http://www.jmlr.org/papers/volume14/bosagh-zadeh13a/bosagh-zadeh13a.pdf
> https://github.com/alsoltani/DimSum



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)