You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Thomas Jungblut (JIRA)" <ji...@apache.org> on 2011/06/17 00:18:47 UTC

[jira] [Issue Comment Edited] (HAMA-358) Evaluation of Hama BSP communication protocol performance

    [ https://issues.apache.org/jira/browse/HAMA-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050509#comment-13050509 ] 

Thomas Jungblut edited comment on HAMA-358 at 6/16/11 10:17 PM:
----------------------------------------------------------------

How is it going? Do you need help?

I've practiced a bit with ProtoBuf.
It seems to be really cool and fast (or at least small).

BUT I have really problems with the question: how we are (or better we could be) able to integrate it.

Let's summarize our current state:
We have an abstract class BSPMessage which leaves the types of tags and data to the concrete implementations.
This gets using Writable into Hadoop's RPC mechanism and gets serialized and deserialized.

What I'm wondering now is how we can improve this using ProtoBuf. 
ProtoBuf needs a "*.proto" file that needs to be compiled to a specific model *.java file. In this file you are declaring what the message needs, in our case this is not known at compile time (for example a user implements a custom vertex that contains distances or something like that). So we have to leave the serialization up to the user. 
The question is how could we doing this?

Here some thoughts (don't take this too serious, just some brainstorming):
There are two options:

 * A generic ".proto" model that takes just two Strings for a tag and data
 * We leave the compiling and implementing of the protos to the user

The first is ultra simple and we don't have to worry about anything the user will submit, since you can serialize everything to strings. 
But I think we are going to "ruin" the optimizations that COULD have been made if the type was known.

The second option is really messy and not too user friendly (since he has to compile the whole stuff and put it into a repository at runtime that we can know the proto), but in constrast to the first option it could result in better results.

Do we have other opportunities?

      was (Author: thomas.jungblut):
    How is it going? Do you need help?
  
> Evaluation of Hama BSP communication protocol performance
> ---------------------------------------------------------
>
>                 Key: HAMA-358
>                 URL: https://issues.apache.org/jira/browse/HAMA-358
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp, documentation 
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.4.0
>
>   Original Estimate: 1,008h
>  Remaining Estimate: 1,008h
>
> The goal of this project is performance evaluation of RPC frameworks (e.g., Hadoop RPC, Thrift, Google Protobuf, ..., etc) to figure out which is the best solution for Hama BSP communication. Currently Hama is using Hadoop RPC to communicate and transfer messages between BSP workers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira