You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Martin Neumann <mn...@spotify.com> on 2014/03/24 15:22:49 UTC

Re: Giraph avro input format

I'm trying to put the GiraphJob I wrote into production so I need to fix
this.

The avro file I need to read is structured in the (bytes) schema. Each
datum is actually just a number of words separated by whitespace.
The whole thing is not yet in graph format so that I need some logic in the
giraph input format to build it.

I tried using the MapReduce avro formats but I could not get it to work.
I haven't tried Gora yet but its seems a bit of overkill to just add it to
load and store avro.

Anyone any Ideas?

cheers Martin



On Tue, Feb 18, 2014 at 8:22 PM, Roman Shaposhnik <ro...@shaposhnik.org>wrote:

> It would be interesting to see what can be done with Avro
> in Giraph natively along the lines of:
>    http://avro.apache.org/docs/1.7.6/mr.html
>
> Thanks,
> Roman.
>
> On Mon, Feb 17, 2014 at 11:49 AM, Claudio Martella
> <cl...@gmail.com> wrote:
> > I'm not sure about what I'm going to say, but Gora should read from Avro,
> > and we do support reading transparently through Gora. you could check
> that
> > out.
> >
> >
> > On Mon, Feb 17, 2014 at 1:32 PM, Martin Neumann <mn...@spotify.com>
> > wrote:
> >>
> >> Hej,
> >>
> >> Is there an avro input format for Giraph? I saw some older (july 2013)
> >> entries on the mailing list and none existed by then. Have things
> changed
> >> since then, or do I have to write my own?
> >>
> >> When I write my own what's a good base class to start from?
> >>
> >> cheers Martin
> >
> >
> >
> >
> > --
> >    Claudio Martella
> >
>

Re: Giraph avro input format

Posted by Martin Neumann <mn...@spotify.com>.
I tried to copy the functionality of the avro mapreduce inputformat but its
not very strait forward.
For start Giraph uses the abstract classes from mapreduce while Avro uses
the interfaces from mapred. So I can only copy part of the logic but not
the code.

I need to process the Avro file to make it a Graph (currently its just
string's stored as avro bytes). So I somehow need to pipe it into the
InputFormat I wrote.
The Gora example on the Giraph page seems to be more about loading a Graph
through Gora where the file has a schema that is already a graph and does
not need preprocessing. So it does not seem to be what I'm looking for. I
haven't worked with Gora but it seems like it brings a lot of abstractions
and functionality that I don't need.

I will have a look at GoraInputFormat and see what I can learn from it.


cheers Martin


On Mon, Mar 24, 2014 at 5:11 PM, Claudio Martella <
claudio.martella@gmail.com> wrote:

> I think you can try to see how the avro mapreduce inputformat works and
> use it on your own in giraph. all the different inputformats we have do
> that, so you can copy them. What exactly of the GoraInputFormat do you find
> overkill and why?
>
>
> On Mon, Mar 24, 2014 at 3:22 PM, Martin Neumann <mn...@spotify.com>wrote:
>
>> I'm trying to put the GiraphJob I wrote into production so I need to fix
>> this.
>>
>> The avro file I need to read is structured in the (bytes) schema. Each
>> datum is actually just a number of words separated by whitespace.
>> The whole thing is not yet in graph format so that I need some logic in
>> the giraph input format to build it.
>>
>> I tried using the MapReduce avro formats but I could not get it to work.
>> I haven't tried Gora yet but its seems a bit of overkill to just add it
>> to load and store avro.
>>
>> Anyone any Ideas?
>>
>> cheers Martin
>>
>>
>>
>> On Tue, Feb 18, 2014 at 8:22 PM, Roman Shaposhnik <ro...@shaposhnik.org>wrote:
>>
>>> It would be interesting to see what can be done with Avro
>>> in Giraph natively along the lines of:
>>>    http://avro.apache.org/docs/1.7.6/mr.html
>>>
>>> Thanks,
>>> Roman.
>>>
>>> On Mon, Feb 17, 2014 at 11:49 AM, Claudio Martella
>>> <cl...@gmail.com> wrote:
>>> > I'm not sure about what I'm going to say, but Gora should read from
>>> Avro,
>>> > and we do support reading transparently through Gora. you could check
>>> that
>>> > out.
>>> >
>>> >
>>> > On Mon, Feb 17, 2014 at 1:32 PM, Martin Neumann <mn...@spotify.com>
>>> > wrote:
>>> >>
>>> >> Hej,
>>> >>
>>> >> Is there an avro input format for Giraph? I saw some older (july 2013)
>>> >> entries on the mailing list and none existed by then. Have things
>>> changed
>>> >> since then, or do I have to write my own?
>>> >>
>>> >> When I write my own what's a good base class to start from?
>>> >>
>>> >> cheers Martin
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> >    Claudio Martella
>>> >
>>>
>>
>>
>
>
> --
>    Claudio Martella
>
>

Re: Giraph avro input format

Posted by Claudio Martella <cl...@gmail.com>.
I think you can try to see how the avro mapreduce inputformat works and use
it on your own in giraph. all the different inputformats we have do that,
so you can copy them. What exactly of the GoraInputFormat do you find
overkill and why?


On Mon, Mar 24, 2014 at 3:22 PM, Martin Neumann <mn...@spotify.com>wrote:

> I'm trying to put the GiraphJob I wrote into production so I need to fix
> this.
>
> The avro file I need to read is structured in the (bytes) schema. Each
> datum is actually just a number of words separated by whitespace.
> The whole thing is not yet in graph format so that I need some logic in
> the giraph input format to build it.
>
> I tried using the MapReduce avro formats but I could not get it to work.
> I haven't tried Gora yet but its seems a bit of overkill to just add it to
> load and store avro.
>
> Anyone any Ideas?
>
> cheers Martin
>
>
>
> On Tue, Feb 18, 2014 at 8:22 PM, Roman Shaposhnik <ro...@shaposhnik.org>wrote:
>
>> It would be interesting to see what can be done with Avro
>> in Giraph natively along the lines of:
>>    http://avro.apache.org/docs/1.7.6/mr.html
>>
>> Thanks,
>> Roman.
>>
>> On Mon, Feb 17, 2014 at 11:49 AM, Claudio Martella
>> <cl...@gmail.com> wrote:
>> > I'm not sure about what I'm going to say, but Gora should read from
>> Avro,
>> > and we do support reading transparently through Gora. you could check
>> that
>> > out.
>> >
>> >
>> > On Mon, Feb 17, 2014 at 1:32 PM, Martin Neumann <mn...@spotify.com>
>> > wrote:
>> >>
>> >> Hej,
>> >>
>> >> Is there an avro input format for Giraph? I saw some older (july 2013)
>> >> entries on the mailing list and none existed by then. Have things
>> changed
>> >> since then, or do I have to write my own?
>> >>
>> >> When I write my own what's a good base class to start from?
>> >>
>> >> cheers Martin
>> >
>> >
>> >
>> >
>> > --
>> >    Claudio Martella
>> >
>>
>
>


-- 
   Claudio Martella