You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by RJ Nowling <rn...@gmail.com> on 2014/08/27 23:17:59 UTC

[GraphX] JIRA / PR to fix breakage in GraphGenerator.logNormalGraph in PR #720

Hi all,

 PR #720 <https://github.com/apache/spark/pull/720> made multiple changes
to GraphGenerator.logNormalGraph including:

   - Replacing the call to functions for generating random vertices and
   edges with in-line implementations with different equations. Based on
   reading the Pregel paper, I believe the in-line functions are incorrect.
   - Hard-coding of RNG seeds so that method now generates the same graph
   for a given number of vertices, edges, mu, and sigma -- user is not able to
   override seed or specify that seed should be randomly generated.
   - Backwards-incompatible change to logNormalGraph signature with
   introduction of new required parameter.
   - Failed to update scala docs and programming guide for API changes
   - Added a Synthetic Benchmark in the examples.

I submitted JIRA SPARK-3263
<https://issues.apache.org/jira/browse/SPARK-3263> and PR #2168
<https://github.com/apache/spark/pull/2168> to revert some of these changes
and fix usage of the RNGs:

   - Removes the in-line calls and calls original vertex / edge generation
   functions again
   - Adds an optional seed parameter for deterministic behavior (when
   desired)
   - Keeps the number of partitions parameter that was added.
   - Keeps compatibility with the synthetic benchmark example
   - Maintains backwards-compatible API

 I would appreciate feedback and people taking a look.  :)

Thanks!
RJ

-- 
em rnowling@gmail.com
c 954.496.2314