You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Armando Miraglia <a....@student.vu.nl> on 2013/07/04 17:19:45 UTC

A new InputFormat. What to extend?

Hi guys.

I am currently trying to implement a PoC for the issue GIRAPH-549 (which
btw is the main topic of my GSoC project).

As suggested in the issue by Claudio I looked at the Faunus
implementation to connect to Rexster and get the data but at the moment
I am overwhelmed by all the available classes.

My question and doubt is the following: Faunus approach is to create a
InputFormat extending directly from the hadoop InputFormat class. I
however saw that some classes in Giraph extend directly from hadoop
classes while others extend from VertexInputFormat (like
TextVertexInputFormat). So what would be the best choice I could make? I
started extending VertexInputFormat but an opinion from you would be
very appreciated.

If you need any additional details just let me know.

Cheers,
Armando

Re: A new InputFormat. What to extend?

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Hi Eli,

You are right, that what we did. In my case, I used the vertexReader class
to transform Apache Gora data beans into org.apache.giraph.graph.Vertex
objects. In this way, Giraph code is not modified and nor is Gora's, but
just "converted" into vertices or edges that Giraph needs (:


Renato M.


2013/7/26 Armando Miraglia <a....@student.vu.nl>

> On Fri, Jul 26, 2013 at 01:54:38PM -0700, Eli Reisman wrote:
> > My instinct is you want to start from one of Giraph's higher-level
> > abstractions instead of a raw Hadoop InputFormat.
>
> I actually finished and got the patch commited :)
>
> GIRAPH-549
>
> Cheers,
> Armando
>

Re: A new InputFormat. What to extend?

Posted by Armando Miraglia <a....@student.vu.nl>.
On Fri, Jul 26, 2013 at 01:54:38PM -0700, Eli Reisman wrote:
> My instinct is you want to start from one of Giraph's higher-level
> abstractions instead of a raw Hadoop InputFormat.

I actually finished and got the patch commited :)

GIRAPH-549

Cheers,
Armando

Re: A new InputFormat. What to extend?

Posted by Eli Reisman <ap...@gmail.com>.
My instinct is you want to start from one of Giraph's higher-level
abstractions instead of a raw Hadoop InputFormat.


On Thu, Jul 11, 2013 at 4:47 PM, Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Hi Armando,
>
> I really understand what you're saying about the input formats because I am
> also writing an integration with Apache Gora and I am facing the same
> problems. This is because Gora does not rely directly on Hadoop input
> formats but Giraph does.
> I think an alternative would be to write an abstraction for input formats
> which would have to be agnostic to how data is serialized. In this way,
> Giraph could read and write data from any data source without directly
> depending on Hadoop's input format.
> On the other hand we could extend Hadoop input formats and let them live on
> their corresponding modules. IMHO the former option would be a better
> choice for extensibility and modularity purposes.
>
> Renato M.
> Hi guys.
>
> I am currently trying to implement a PoC for the issue GIRAPH-549 (which
> btw is the main topic of my GSoC project).
>
> As suggested in the issue by Claudio I looked at the Faunus
> implementation to connect to Rexster and get the data but at the moment
> I am overwhelmed by all the available classes.
>
> My question and doubt is the following: Faunus approach is to create a
> InputFormat extending directly from the hadoop InputFormat class. I
> however saw that some classes in Giraph extend directly from hadoop
> classes while others extend from VertexInputFormat (like
> TextVertexInputFormat). So what would be the best choice I could make? I
> started extending VertexInputFormat but an opinion from you would be
> very appreciated.
>
> If you need any additional details just let me know.
>
> Cheers,
> Armando
>

Re: A new InputFormat. What to extend?

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Hi Armando,

I really understand what you're saying about the input formats because I am
also writing an integration with Apache Gora and I am facing the same
problems. This is because Gora does not rely directly on Hadoop input
formats but Giraph does.
I think an alternative would be to write an abstraction for input formats
which would have to be agnostic to how data is serialized. In this way,
Giraph could read and write data from any data source without directly
depending on Hadoop's input format.
On the other hand we could extend Hadoop input formats and let them live on
their corresponding modules. IMHO the former option would be a better
choice for extensibility and modularity purposes.

Renato M.
Hi guys.

I am currently trying to implement a PoC for the issue GIRAPH-549 (which
btw is the main topic of my GSoC project).

As suggested in the issue by Claudio I looked at the Faunus
implementation to connect to Rexster and get the data but at the moment
I am overwhelmed by all the available classes.

My question and doubt is the following: Faunus approach is to create a
InputFormat extending directly from the hadoop InputFormat class. I
however saw that some classes in Giraph extend directly from hadoop
classes while others extend from VertexInputFormat (like
TextVertexInputFormat). So what would be the best choice I could make? I
started extending VertexInputFormat but an opinion from you would be
very appreciated.

If you need any additional details just let me know.

Cheers,
Armando