You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "npanj (JIRA)" <ji...@apache.org> on 2014/08/23 02:30:11 UTC

[jira] [Created] (SPARK-3190) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere

npanj created SPARK-3190:
----------------------------

             Summary: Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere
                 Key: SPARK-3190
                 URL: https://issues.apache.org/jira/browse/SPARK-3190
             Project: Spark
          Issue Type: Bug
          Components: GraphX
    Affects Versions: 1.0.3
         Environment: Standalone mode running on EC2 
            Reporter: npanj
            Priority: Critical


While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset > 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?).

Here is some details of experiments  I have done so far: 
1. Input: numNodes=6101995593 ; noEdges=12163784626
   Graph returns: numVertices=1807028297 ;  numEdges=12163784626

2. Input : numNodes=2157586441 ; noEdges=2747322705
   Graph Returns: numVertices=-2137380855 ;  numEdges=2747322705

3. Input: numNodes=1725060105 ; noEdges=204176821
   Graph: numVertices=1725060105 ;  numEdges=2041768213

You can find the code to generate this bug here: 

https://gist.github.com/npanj/92e949d86d08715bf4bf













 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org