You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Jakob Homan (Created) (JIRA)" <ji...@apache.org> on 2012/01/31 23:17:59 UTC

[jira] [Created] (GIRAPH-139) Change PageRankBenchmark to use be accessible via bin/giraph

Change PageRankBenchmark to use be accessible via bin/giraph
------------------------------------------------------------

                 Key: GIRAPH-139
                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
             Project: Giraph
          Issue Type: Improvement
            Reporter: Jakob Homan


Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (GIRAPH-139) Change PageRankBenchmark to use be accessible via bin/giraph

Posted by "Jakob Homan (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakob Homan reassigned GIRAPH-139:
----------------------------------

    Assignee: Jakob Homan
    
> Change PageRankBenchmark to use be accessible via bin/giraph
> ------------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205087#comment-13205087 ] 

Jakob Homan commented on GIRAPH-139:
------------------------------------

@Avery - does this new approach work for you?
                
> Change PageRankBenchmark to be accessible via bin/giraph
> --------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-139-b.patch, GIRAPH-139.patch
>
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-139) Change PageRankBenchmark to use be accessible via bin/giraph

Posted by "Jakob Homan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakob Homan updated GIRAPH-139:
-------------------------------

    Attachment: GIRAPH-139.patch

Patch that makes the page rank benchmark accessible from bin/giraph.  The default is the edgelist-based vertex, but the hashmap-based vertex is available as a separate class.  Calling the example is a bit hairy, since it's now not restricted to just the pseudoinputformat:

{noformat} bin/giraph \n
-DPageRankBenchmark.superstepCount=200 \n
-DpseudoRandomVertexReader.aggregateVertices=220 \n
-DpseudoRandomVertexReader.edgesPerVertex=37 \n
lib/giraph-0.2-SNAPSHOT.jar \n
org.apache.giraph.benchmark.HashMapPageRankBenchmark \n -w 10 \n 
-if org.apache.giraph.benchmark.PseudoRandomVertexInputFormat \n 
-of org.apache.giraph.lib.AdjacencyListTextVertexOutputFormat \n 
-op benchmark_results
{noformat}
I'm thinking that allowing vertices to provide default in/outputformat via annotations may be a way to avoid some of this extra text.

Tests, javadoc and rat passes (except rat complains about CODE_CONVENTIONS, which is out of scope of this JIRA).

Once this is committed, we need to update the wiki...
                
> Change PageRankBenchmark to use be accessible via bin/giraph
> ------------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-139.patch
>
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203891#comment-13203891 ] 

Jakob Homan commented on GIRAPH-139:
------------------------------------

bq. However, I don't see why we need to remove the run() and main() methods from PageRankBenchmark.java. Why not have both methods to run the benchmark?
My concern is twofold: code duplication, in that most of the code in PageRankBenchmark duplicates code in GiraphRunner, and user confusion over which approach is correct.  I ran into issues trying to run the benchmark via main.  Also, since PageRankBenchmark had to be refactored into separate classes to support the two vertex types, it will require adjusting the main driver code, which means we're fixing duplicated code already.  Is it better to work on making bin/giraph easier to use than to expend that energy on maintaining duplicate code?  

Eventually, I would like to get the benchmark code into the examples directory and have it work the same way the example jar for Hadoop does: one can do bin/giraph giraph-examples.jar and be presented with all the example programs available and how to run them.
                
> Change PageRankBenchmark to be accessible via bin/giraph
> --------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-139.patch
>
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203877#comment-13203877 ] 

Avery Ching commented on GIRAPH-139:
------------------------------------

Hi Jakob, I generally agree with what you have done and was able to use bin/giraph to execute your command (one minor change).

{quote}
./bin/giraph -DPageRankBenchmark.superstepCount=200  -DpseudoRandomVertexReader.aggregateVertices=220 -DpseudoRandomVertexReader.edgesPerVertex=37 target/giraph-0.2-SNAPSHOT.jar org.apache.giraph.benchmark.HashMapVertexPageRankBenchmark  -w 2  -if org.apache.giraph.benchmark.PseudoRandomVertexInputFormat -of org.apache.giraph.lib.AdjacencyListTextVertexOutputFormat -op benchmark_results
{quote}

However, I don't see why we need to remove the run() and main() methods from PageRankBenchmark.java.  Why not have both methods to run the benchmark?  As you've already mentioned, it is a bit verbose to run the above command.  I agree that using bin/giraph is probably the right way to go in the future, however.  Once we bin/giraph is nearly as easy to run as invoking main() directly, main() won't be necessary.

One very minor comment:

HashMapVertexPageRankBenchmark.java:28 - Benchmark -> benchmark
                
> Change PageRankBenchmark to be accessible via bin/giraph
> --------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-139.patch
>
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

Posted by "Jakob Homan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakob Homan updated GIRAPH-139:
-------------------------------

    Attachment: GIRAPH-139-b.patch

New patch that mains PageRankBenchmark as a separately runnable program, but one that can be quickly deleted once the example jar is sorted.
                
> Change PageRankBenchmark to be accessible via bin/giraph
> --------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-139-b.patch, GIRAPH-139.patch
>
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

Posted by "Jakob Homan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakob Homan updated GIRAPH-139:
-------------------------------

    Summary: Change PageRankBenchmark to be accessible via bin/giraph  (was: Change PageRankBenchmark to use be accessible via bin/giraph)
    
> Change PageRankBenchmark to be accessible via bin/giraph
> --------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-139.patch
>
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203900#comment-13203900 ] 

Avery Ching commented on GIRAPH-139:
------------------------------------

I agree the main() and run() code should be deprecated, but preferably after giraph-examples.jar is ready =).  
                
> Change PageRankBenchmark to be accessible via bin/giraph
> --------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-139.patch
>
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203958#comment-13203958 ] 

Jakob Homan commented on GIRAPH-139:
------------------------------------

How about I add back in the main and run as deprecated, leave it in for developers, and change the wiki to use bin/giraph for the example, with an eye to removing it as soon as the example jar is set up?
                
> Change PageRankBenchmark to be accessible via bin/giraph
> --------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-139.patch
>
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203967#comment-13203967 ] 

Avery Ching commented on GIRAPH-139:
------------------------------------

sounds good to me.
                
> Change PageRankBenchmark to be accessible via bin/giraph
> --------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-139.patch
>
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205598#comment-13205598 ] 

Avery Ching commented on GIRAPH-139:
------------------------------------

+1
Looks good to me.
                
> Change PageRankBenchmark to be accessible via bin/giraph
> --------------------------------------------------------
>
>                 Key: GIRAPH-139
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-139
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-139-b.patch, GIRAPH-139.patch
>
>
> Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script.  It would be better if everything were accessible via bin/giraph.  The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira