You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Jakob Homan (Updated) (JIRA)" <ji...@apache.org> on 2011/11/01 03:00:32 UTC

[jira] [Updated] (GIRAPH-64) Create VertexRunner to make it easier to run users' computations

     [ https://issues.apache.org/jira/browse/GIRAPH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakob Homan updated GIRAPH-64:
------------------------------

    Attachment: GIRAPH-64.patch

Here's a patch that introduces that old bin folder we all know and lo{ve|athe}.  This also gives us the start of the package we'll need to think about making releases.  Users no longer have to merge their code into the Giraph source to get it to run.
With the new bin/giraph, assuming an implementation of Vertex such as (taken from the pagerankbenchmark, obviously):
{code}import java.util.Iterator;

public class FirstVertex extends
    Vertex<LongWritable, DoubleWritable, DoubleWritable, DoubleWritable> {
    /** Configuration from Configurable */
    private Configuration conf;

    /** How many supersteps to run */
    public static String SUPERSTEP_COUNT = "PageRankBenchmark.superstepCount";

    @Override
    public void preApplication()
        throws InstantiationException, IllegalAccessException {
    }

    @Override
    public void postApplication() {
    }

    @Override
    public void preSuperstep() {
    }

    @Override
    public void compute(Iterator<DoubleWritable> msgIterator) {
        if (getSuperstep() >= 1) {
            double sum = 0;
            while (msgIterator.hasNext()) {
                sum += msgIterator.next().get();
            }
            DoubleWritable vertexValue =
                new DoubleWritable((0.15f / getNumVertices()) + 0.85f * sum);
            setVertexValue(vertexValue);
        }

        if (getSuperstep() < getConf().getInt(SUPERSTEP_COUNT, -1)) {
            long edges = getNumOutEdges();
            sendMsgToAllEdges(new DoubleWritable(getVertexValue().get() / edges));
        } else {
            voteToHalt();
        }
    }

  @Override
  public Configuration getConf() {
      return conf;
  }

  @Override
  public void setConf(Configuration conf) {
      this.conf = conf;
  }

}{code}
one can run it via:
{noformat}bin/giraph \
-DPageRankBenchmark.superstepCount=30 \
-DpseduoRandomVertexReader.aggregateVertices=220 \
-DpseduoRandomVertexReader.edgesPerVertex=37 \
~/kick-ass-vertex-1.0.jar giraph1.FirstVertex \
-w 10 \
-if org.apache.giraph.benchmark.PseudoRandomVertexInputFormat \
-of org.apache.giraph.lib.JsonBase64VertexOutputFormat \
-op output_path{noformat}
bin/giraph is heavily cribbed from mahout and pig, btw.  
Is there any reason the fatjar approach was taken other than expediency?  This patch uses the fatjar approach for testing, but uses a standard lib folder approach for the actual package.  I'd like to remove the fatjar entirely, eventually.

This is a rough script and will need lots of enhancements as we go, but I think it's a good start.
                
> Create VertexRunner to make it easier to run users' computations
> ----------------------------------------------------------------
>
>                 Key: GIRAPH-64
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-64
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-64.patch
>
>
> Currently, if a user wants to implement a Giraph algorithm by extending {{Vertex}} they must also write all the boilerplate around the {{Tool}} interface and bundle it with the Giraph jar (or get Giraph on the classpath and playing nice with the implementation).  For example, what is included in the PageRankBenchmark and what Kohei has done: https://github.com/smly/java-Giraph-LabelPropagation  It would be better if we had perhaps a Vertex implementation to be subclassed that already had all the standard Tooling included such that all one had to run would be (assuming the Giraph jar was already on the classpath):
> {noformat}hadoop jar my-awesome-vertex.jar my.awesome.vertex -i jazz_input -o jazz_output -if org.apache.giraph.lib.in.text.adjacency-list.LongDoubleDouble -of org.apache.giraph.lib.out.text.adjacency-list.LongDoubleDouble{noformat} This wouldn't work with every algorithm, but would be useful in a large number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira