You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Jingyi Mei (JIRA)" <ji...@apache.org> on 2017/08/30 17:00:02 UTC

[jira] [Comment Edited] (MADLIB-1124) Graph - HITS algorithm

    [ https://issues.apache.org/jira/browse/MADLIB-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147605#comment-16147605 ] 

Jingyi Mei edited comment on MADLIB-1124 at 8/30/17 4:59 PM:
-------------------------------------------------------------

To answer questions from [~fmcquillan]:
1. Threshold. Yes, it should be optional. Because we do normalization for both authority and hub, the valid threshold range is \[0.1\]. Currently, there is no strong evidence that we should use different thresholds for authority and hub score, and we decide to pick (1/number of vertices * 1000) as default threshold. Here is the new description for threshold:
{code}
Threshold (optional): FLOAT8, default: (1/number of vertices * 1000). If the difference between the values of both scores (Authority and Hub) for every vertex of two consecutive iterations is smaller than 'threshold', or the iteration number is larger than 'max_iter', the computation stops. If you set the threshold to zero, then you will force the algorithm to run for the full number of iterations specified in 'max_iter'. Threshold need to be set to a value equal or less than 1 since both values (Authority and Hub) of nodes are initialized as 1. Note that both Authority and Hub value difference must be below threshold for the algorithm to stop. 
{code}
2. HITS doesn’t assign different ‘weight' or ‘importance' to different nodes, so it shouldn’t rely on eigenvector centrality.


was (Author: jingyimei):
To answer questions from [~fmcquillan]:
1. Threshold. Yes, it should be optional. Because we do normalization for both authority and hub, the valid threshold range is \[0.1\]. Currently, there is no strong evidence that we should use different thresholds for authority and hub score, and we decide to pick (1/number of vertices * 1000) as default threshold. Here is the new description for threshold:
{code}
FLOAT8, default: (1/number of vertices * 1000). If the difference between the values of both scores (Authority and Hub) for every vertex of two consecutive iterations is smaller than 'threshold', or the iteration number is larger than 'max_iter', the computation stops. If you set the threshold to zero, then you will force the algorithm to run for the full number of iterations specified in 'max_iter'. Threshold need to be set to a value equal or less than 1 since both values (Authority and Hub) of nodes are initialized as 1. Note that both Authority and Hub value difference must be below threshold for the algorithm to stop. 
{code}
2. HITS doesn’t assign different ‘weight' or ‘importance' to different nodes, so it shouldn’t rely on eigenvector centrality.

> Graph - HITS algorithm
> ----------------------
>
>                 Key: MADLIB-1124
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1124
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Graph
>            Reporter: Frank McQuillan
>            Assignee: Jingyi Mei
>             Fix For: v2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)