You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/05/22 21:41:00 UTC

[jira] [Updated] (MADLIB-1229) Duplicated result in PageRank output table with grouping

     [ https://issues.apache.org/jira/browse/MADLIB-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan updated MADLIB-1229:
------------------------------------
    Priority: Minor  (was: Major)

> Duplicated result in PageRank output table with grouping
> --------------------------------------------------------
>
>                 Key: MADLIB-1229
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1229
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Graph
>            Reporter: Jingyi Mei
>            Priority: Minor
>             Fix For: v1.15
>
>
> In madlib 1.13, if I run the follow query
> {code:java}
> DROP TABLE IF EXISTS vertex, "EDGE";
> CREATE TABLE vertex(
> id INTEGER
> );
> CREATE TABLE "EDGE"(
> src INTEGER,
> dest INTEGER,
> user_id INTEGER
> );
> INSERT INTO vertex VALUES
> (0),
> (1),
> (2);
> INSERT INTO "EDGE" VALUES
> (0, 1, 1),
> (0, 2, 1),
> (1, 2, 1),
> (2, 1, 1),
> (0, 1, 2);
> DROP TABLE IF EXISTS pagerank_ppr_grp_out;
> DROP TABLE IF EXISTS pagerank_ppr_grp_out_summary;
> SELECT pagerank(
> 'vertex', -- Vertex table
> 'id', -- Vertix id column
> '"EDGE"', -- "EDGE" table
> 'src=src, dest=dest', -- "EDGE" args
> 'pagerank_ppr_grp_out', -- Output table of PageRank
> NULL, -- Default damping factor (0.85)
> NULL, -- Default max iters (100)
> NULL, -- Default Threshold 
> 'user_id');{code}
> I will get result
> {code:java}
> madlib=# select * from pagerank_ppr_grp_out order by user_id, id; user_id | id | pagerank
> ---------+----+-------------------
> 1 | 0 | 0.05
> 1 | 0 | 0.05
> 1 | 1 | 0.614906399170753
> 1 | 2 | 0.614906399170753
> 2 | 0 | 0.075
> 2 | 1 | 0.13875
> (6 rows){code}
> where user_id=1, id=1, pagerank=0.05 appears twice.
> We should correct it to only show distinct result.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)