You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Ferenc Béres <fe...@gmail.com> on 2014/06/19 10:02:47 UTC

How to output into multiple files through a GiraphJob

Hi Everyone,

Currently I'm working on an ALS implementation in giraph 1.1.0 and I would
like to output the values of the vertices into multiple output files, but I
could not figure it out how to do it.

I found that in Hadoop it can be done by using
*org.apache.hadoop.mapreduce.lib.output.MultipleOutputs<KEYOUT,VALUEOUT>,
*but it didn't work with the GiraphJob.

Is it possible to output into multiple files by configuring the GiraphJob,
or there is an other way?

I would appreciate any idea in this matter.

Thank you,
Ferenc Béres

Re: How to output into multiple files through a GiraphJob

Posted by Ferenc Béres <fe...@gmail.com>.
Hi John,

Yes, I also maintain several states within my vertex values so I use
BasicComputation.

Regarding the output I use this configuration:
GiraphConfiguration conf = new GiraphConfiguration();

 conf.setVertexOutputFormatClass(ColumnOutputFormat.class);
     FileOutputFormat.setOutputPath(giraphJob.getInternalJob(), resultPath);

(ColumnOutputFormat extends TextVertexOutputFormat)

But how could I set multiple output path's for my GiraphJob?  Maybe I'm
wrong but as I've seen it only one outputformat and outputpath can be
configured for the job.

Can you give me a short code example, how should I configure the GiraphJob
or the VertexOutputFormat to achieve multiple output file usage? :)

Thank you,
Ferenc




2014-06-19 14:02 GMT+02:00 John Yost <so...@gmail.com>:

> Hi Ferenc,
>
> I have an Giraph job that outputs from the Computation class as opposed to
> the MasterCompute because I need to maintain alot of state within
> VertexValues as opposed to Aggregators.  This is one way of outputting
> results as multiple files.  I am assuming that you want to scope output
> files per sub-graph groupings of vertices, of course. :)
>
> --John
>
>
> On Thu, Jun 19, 2014 at 4:02 AM, Ferenc Béres <fe...@gmail.com> wrote:
>
>> Hi Everyone,
>>
>> Currently I'm working on an ALS implementation in giraph 1.1.0 and I
>> would like to output the values of the vertices into multiple output files,
>> but I could not figure it out how to do it.
>>
>> I found that in Hadoop it can be done by using *org.apache.hadoop.mapreduce.lib.output.MultipleOutputs<KEYOUT,VALUEOUT>,
>> *but it didn't work with the GiraphJob.
>>
>> Is it possible to output into multiple files by configuring the
>> GiraphJob, or there is an other way?
>>
>> I would appreciate any idea in this matter.
>>
>> Thank you,
>> Ferenc Béres
>>
>
>

Re: How to output into multiple files through a GiraphJob

Posted by John Yost <so...@gmail.com>.
Hi Ferenc,

I have an Giraph job that outputs from the Computation class as opposed to
the MasterCompute because I need to maintain alot of state within
VertexValues as opposed to Aggregators.  This is one way of outputting
results as multiple files.  I am assuming that you want to scope output
files per sub-graph groupings of vertices, of course. :)

--John


On Thu, Jun 19, 2014 at 4:02 AM, Ferenc Béres <fe...@gmail.com> wrote:

> Hi Everyone,
>
> Currently I'm working on an ALS implementation in giraph 1.1.0 and I would
> like to output the values of the vertices into multiple output files, but I
> could not figure it out how to do it.
>
> I found that in Hadoop it can be done by using *org.apache.hadoop.mapreduce.lib.output.MultipleOutputs<KEYOUT,VALUEOUT>,
> *but it didn't work with the GiraphJob.
>
> Is it possible to output into multiple files by configuring the GiraphJob,
> or there is an other way?
>
> I would appreciate any idea in this matter.
>
> Thank you,
> Ferenc Béres
>