You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Khaled Ammar (JIRA)" <ji...@apache.org> on 2015/10/06 15:58:26 UTC

[jira] [Comment Edited] (SPARK-10945) GraphX computes Pagerank with NaN (with some datasets)

    [ https://issues.apache.org/jira/browse/SPARK-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945057#comment-14945057 ] 

Khaled Ammar edited comment on SPARK-10945 at 10/6/15 1:57 PM:
---------------------------------------------------------------

Thank you Sean for your comment again. I apologize if the details are not clear, it is my first bug/Jira report.
I use the complete graph dataset in the url. It is in adjacency list format and I convert it to edge list. It has 1.4 B edges, so I did not check them manually. However, I also convert several larger datasets and they work fine with GraphX. Therefore, I *think* the format is not the problem. Moreover, when I run the pageRank with zero iterations, the output is correct. All pagerank values are initialized to 0.15.

I call the Analytics code using the following parameters:
bin/run-example graphx.Analytics pagerank input/twitter --numEPart=50 --output=prSpark --numIter=$1

I can upload the dataset I am using to AmazonAWS, for you to check it, if you are interested.


was (Author: kammar):
Thank you Sean for your comment again. I apologize if the details are not clear, it is my first bug/Jira report.
I use the complete graph dataset in the url. It is in adjacency list format and I convert it to edge list. It has 1.4 B edges, so I did not check them manually. However, I also convert several larger datasets and they work fine with GraphX. Therefore, I think the format is not the problem. Moreover, when I run the pageRank with zero iterations, the output is correct. All pagerank values are initialized to 0.15.

I call the Analytics code using the following parameters:
bin/run-example graphx.Analytics pagerank input/twitter --numEPart=50 --output=prSpark --numIter=$1

I can upload the dataset I am using to AmazonAWS, for you to check it, if you are interested.

> GraphX computes Pagerank with NaN (with some datasets)
> ------------------------------------------------------
>
>                 Key: SPARK-10945
>                 URL: https://issues.apache.org/jira/browse/SPARK-10945
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.3.0
>         Environment: Linux
>            Reporter: Khaled Ammar
>              Labels: test
>
> Hi,
> I run GraphX in a medium size standalone Spark 1.3.0 installation. The pagerank typically works fine, except with one dataset (Twitter: http://law.di.unimi.it/webdata/twitter-2010). This is a public dataset that is commonly used in research papers.
> I found that many vertices have an NaN values. This is true, even if the algorithm run for 1 iteration only.  
> Thanks,
> -Khaled



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org