You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@giraph.apache.org by Claudio Martella <cl...@gmail.com> on 2012/02/06 12:24:38 UTC

the slides for my talk @ FOSDEM

Hello guys,

for those interested, here are the "slides" for my talk at FOSDEM.

http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/

The event was very nice, a tight community and a great interest in
Giraph. Isabel Drost, one of the organizers of Berlin Buzzwords,
invited the talk there. Jakob, are you still planning to talk there?
Maybe we can split Kafka/Giraph talks?

Best,
Claudio

-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: the slides for my talk @ FOSDEM

Posted by André Kelpe <ef...@googlemail.com>.

2012/2/6 Claudio Martella <cl...@gmail.com>:
> Hello guys,
>
> for those interested, here are the "slides" for my talk at FOSDEM.
>
> http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/

Nice slide-deck, to bad I was to late at fosdem yesterday to see it...
Question: Is the official giraph logo? I was looking for one, that I
want to include in an internal slide deck and could not find one.

Thanks!

André

Re: the slides for my talk @ FOSDEM

Posted by Claudio Martella <cl...@gmail.com>.

Sure, I ll wait until the video is put on YouTube and put both.

On Monday, February 6, 2012, Jakob Homan <jg...@gmail.com> wrote:
> Also, Claudio, don't forget to add your presentation to the website:
> https://incubator.apache.org/giraph/
>
> On Mon, Feb 6, 2012 at 7:53 AM, Jakob Homan <jg...@gmail.com> wrote:
>>> Jakob, are you still planning to talk there? Maybe we can split
Kafka/Giraph talks?
>> Yes, I've already submitted my Giraph talk.  I'm not really involved
>> with Kafka at the moment; if I go, it'll be on the Giraph talk...
>>
>>
>> On Mon, Feb 6, 2012 at 5:54 AM, Sebastian Schelter <ss...@apache.org>
wrote:
>>> Hi Claudio,
>>>
>>> nice job with the slides! I have only one small point to criticize:
>>>
>>> When PageRank is implemented with MapReduce, it's not necessary to have
>>> the graph passed through in each iteration. Mahout for example uses
>>> power iterations where the adjacency matrix is multiplied by the
>>> pagerank vector and only that vector has to be sent over the network.
>>> Pegasus uses a similar approach.
>>>
>>> /s
>>>
>>>
>>>
>>> On 06.02.2012 12:24, Claudio Martella wrote:
>>>> Hello guys,
>>>>
>>>> for those interested, here are the "slides" for my talk at FOSDEM.
>>>>
>>>>
http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
>>>>
>>>> The event was very nice, a tight community and a great interest in
>>>> Giraph. Isabel Drost, one of the organizers of Berlin Buzzwords,
>>>> invited the talk there. Jakob, are you still planning to talk there?
>>>> Maybe we can split Kafka/Giraph talks?
>>>>
>>>> Best,
>>>> Claudio
>>>>
>>>
>

-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: the slides for my talk @ FOSDEM

Posted by Jakob Homan <jg...@gmail.com>.

Also, Claudio, don't forget to add your presentation to the website:
https://incubator.apache.org/giraph/

On Mon, Feb 6, 2012 at 7:53 AM, Jakob Homan <jg...@gmail.com> wrote:
>> Jakob, are you still planning to talk there? Maybe we can split Kafka/Giraph talks?
> Yes, I've already submitted my Giraph talk.  I'm not really involved
> with Kafka at the moment; if I go, it'll be on the Giraph talk...
>
>
> On Mon, Feb 6, 2012 at 5:54 AM, Sebastian Schelter <ss...@apache.org> wrote:
>> Hi Claudio,
>>
>> nice job with the slides! I have only one small point to criticize:
>>
>> When PageRank is implemented with MapReduce, it's not necessary to have
>> the graph passed through in each iteration. Mahout for example uses
>> power iterations where the adjacency matrix is multiplied by the
>> pagerank vector and only that vector has to be sent over the network.
>> Pegasus uses a similar approach.
>>
>> /s
>>
>>
>>
>> On 06.02.2012 12:24, Claudio Martella wrote:
>>> Hello guys,
>>>
>>> for those interested, here are the "slides" for my talk at FOSDEM.
>>>
>>> http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
>>>
>>> The event was very nice, a tight community and a great interest in
>>> Giraph. Isabel Drost, one of the organizers of Berlin Buzzwords,
>>> invited the talk there. Jakob, are you still planning to talk there?
>>> Maybe we can split Kafka/Giraph talks?
>>>
>>> Best,
>>> Claudio
>>>
>>

Re: the slides for my talk @ FOSDEM

Posted by Claudio Martella <cl...@gmail.com>.

Ok weird, i asked and she mentioned she hasn't received any. Good luck.

On Monday, February 6, 2012, Jakob Homan <jg...@gmail.com> wrote:
>> Jakob, are you still planning to talk there? Maybe we can split
Kafka/Giraph talks?
> Yes, I've already submitted my Giraph talk.  I'm not really involved
> with Kafka at the moment; if I go, it'll be on the Giraph talk...
>
>
> On Mon, Feb 6, 2012 at 5:54 AM, Sebastian Schelter <ss...@apache.org> wrote:
>> Hi Claudio,
>>
>> nice job with the slides! I have only one small point to criticize:
>>
>> When PageRank is implemented with MapReduce, it's not necessary to have
>> the graph passed through in each iteration. Mahout for example uses
>> power iterations where the adjacency matrix is multiplied by the
>> pagerank vector and only that vector has to be sent over the network.
>> Pegasus uses a similar approach.
>>
>> /s
>>
>>
>>
>> On 06.02.2012 12:24, Claudio Martella wrote:
>>> Hello guys,
>>>
>>> for those interested, here are the "slides" for my talk at FOSDEM.
>>>
>>>
http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
>>>
>>> The event was very nice, a tight community and a great interest in
>>> Giraph. Isabel Drost, one of the organizers of Berlin Buzzwords,
>>> invited the talk there. Jakob, are you still planning to talk there?
>>> Maybe we can split Kafka/Giraph talks?
>>>
>>> Best,
>>> Claudio
>>>
>>
>

-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: the slides for my talk @ FOSDEM

Posted by Jakob Homan <jg...@gmail.com>.

> Jakob, are you still planning to talk there? Maybe we can split Kafka/Giraph talks?
Yes, I've already submitted my Giraph talk.  I'm not really involved
with Kafka at the moment; if I go, it'll be on the Giraph talk...


On Mon, Feb 6, 2012 at 5:54 AM, Sebastian Schelter <ss...@apache.org> wrote:
> Hi Claudio,
>
> nice job with the slides! I have only one small point to criticize:
>
> When PageRank is implemented with MapReduce, it's not necessary to have
> the graph passed through in each iteration. Mahout for example uses
> power iterations where the adjacency matrix is multiplied by the
> pagerank vector and only that vector has to be sent over the network.
> Pegasus uses a similar approach.
>
> /s
>
>
>
> On 06.02.2012 12:24, Claudio Martella wrote:
>> Hello guys,
>>
>> for those interested, here are the "slides" for my talk at FOSDEM.
>>
>> http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
>>
>> The event was very nice, a tight community and a great interest in
>> Giraph. Isabel Drost, one of the organizers of Berlin Buzzwords,
>> invited the talk there. Jakob, are you still planning to talk there?
>> Maybe we can split Kafka/Giraph talks?
>>
>> Best,
>> Claudio
>>
>

Re: running job with giraph dependency anomaly

Posted by Jakob Homan <jg...@gmail.com>.

Nothing that Giraph does should be influenced by 32/64 (basically,
very rare caveats apply, etc, etc).  I'm still not clear on what error
you're encountering.  Your custom mapper sets everything GraphMapper
does, but then doesn't run?

On Tue, Feb 7, 2012 at 6:18 PM, David Garcia <dg...@potomacfusion.com> wrote:
> Yeah.  I haven't changed anything with the standard Giraph stuff.  I just
> made my own vertex and and VertexInputFormat.  We are in a 64bit
> environment. . .is it possible that building a jar with 32bit tools would
> be a problem?  I wouldn't think so, since that addressing
> native-dependency issues was sort of the *point* of java. . .but, this
> seems really odd to me.  Are there some dependency restrictions that I
> should know about?  We have to use Jackson 1.6 (because we use cloudera
> distribution of hadoop), and there are other libraries we use.  Thx again
> for the feedback.
>
> -David
>
> On 2/7/12 8:08 PM, "Avery Ching" <ac...@apache.org> wrote:
>
>>If you're using GiraphJob, the mapper class should be set for you.
>>That's weird.
>>
>>Avery
>>
>>On 2/7/12 5:58 PM, David Garcia wrote:
>>> That's interesting.  Yes, I don't need native libraries.  The problem
>>>I'm
>>> having is that after I run job.waitForCompletion(..),
>>> The job runs a mapper that is something other than GraphMapper.  It
>>> doesn't complain that a Mapper isn't defined or anything.  It runs
>>> something else.  As I mentioned below, the map-class doesn't appear to
>>>be
>>> defined.
>>>
>>>
>>> On 2/7/12 7:50 PM, "Jakob Homan"<jg...@gmail.com>  wrote:
>>>
>>>> That's not necessarily a bad thing.  Hadoop (not Giraph) has native
>>>> code library it can use for improved performance.  You'll see this
>>>> message when running on a cluster that's not been deployed to use the
>>>> native libraries.  If I follow what you wrote, most likely your work
>>>> project cluster is so configured.  Unless you actively expect to have
>>>> the native libraries loaded, I wouldn't be concerned.
>>>>
>>>>
>>>> On Tue, Feb 7, 2012 at 5:46 PM, David Garcia<dg...@potomacfusion.com>
>>>> wrote:
>>>>> I am running into a weird error that I haven't seen yet (I suppose
>>>>>I've
>>>>> been lucky).  I see the following in the logging:
>>>>>
>>>>> org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
>>>>> library for your platform... using builtin-java classes where
>>>>>applicable
>>>>>
>>>>>
>>>>> In the job definition, the property "mapreduce.map.class" is not even
>>>>> defined.  For Giraph, this is usually set to
>>>>> "mapreduce.map.class=org.apache.giraph.graph.GraphMapper"
>>>>>
>>>>> I'm building my project with hadoop 0.20.204.
>>>>>
>>>>> When I build the GiraphProject myself (and run my own tests with the
>>>>> projects dependencies), I have no problems.  The main difference is
>>>>>that
>>>>> I'm using a Giraph dependency in my work project.  All input is
>>>>>welcome.
>>>>> Thx!!
>>>>>
>>>>> -David
>>>>>
>>
>

Re: running job with giraph dependency anomaly

Posted by David Garcia <dg...@potomacfusion.com>.

Yeah.  I haven't changed anything with the standard Giraph stuff.  I just
made my own vertex and and VertexInputFormat.  We are in a 64bit
environment. . .is it possible that building a jar with 32bit tools would
be a problem?  I wouldn't think so, since that addressing
native-dependency issues was sort of the *point* of java. . .but, this
seems really odd to me.  Are there some dependency restrictions that I
should know about?  We have to use Jackson 1.6 (because we use cloudera
distribution of hadoop), and there are other libraries we use.  Thx again
for the feedback.

-David

On 2/7/12 8:08 PM, "Avery Ching" <ac...@apache.org> wrote:

>If you're using GiraphJob, the mapper class should be set for you.
>That's weird.
>
>Avery
>
>On 2/7/12 5:58 PM, David Garcia wrote:
>> That's interesting.  Yes, I don't need native libraries.  The problem
>>I'm
>> having is that after I run job.waitForCompletion(..),
>> The job runs a mapper that is something other than GraphMapper.  It
>> doesn't complain that a Mapper isn't defined or anything.  It runs
>> something else.  As I mentioned below, the map-class doesn't appear to
>>be
>> defined.
>>
>>
>> On 2/7/12 7:50 PM, "Jakob Homan"<jg...@gmail.com>  wrote:
>>
>>> That's not necessarily a bad thing.  Hadoop (not Giraph) has native
>>> code library it can use for improved performance.  You'll see this
>>> message when running on a cluster that's not been deployed to use the
>>> native libraries.  If I follow what you wrote, most likely your work
>>> project cluster is so configured.  Unless you actively expect to have
>>> the native libraries loaded, I wouldn't be concerned.
>>>
>>>
>>> On Tue, Feb 7, 2012 at 5:46 PM, David Garcia<dg...@potomacfusion.com>
>>> wrote:
>>>> I am running into a weird error that I haven't seen yet (I suppose
>>>>I've
>>>> been lucky).  I see the following in the logging:
>>>>
>>>> org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where
>>>>applicable
>>>>
>>>>
>>>> In the job definition, the property "mapreduce.map.class" is not even
>>>> defined.  For Giraph, this is usually set to
>>>> "mapreduce.map.class=org.apache.giraph.graph.GraphMapper"
>>>>
>>>> I'm building my project with hadoop 0.20.204.
>>>>
>>>> When I build the GiraphProject myself (and run my own tests with the
>>>> projects dependencies), I have no problems.  The main difference is
>>>>that
>>>> I'm using a Giraph dependency in my work project.  All input is
>>>>welcome.
>>>> Thx!!
>>>>
>>>> -David
>>>>
>

Re: running job with giraph dependency anomaly

Posted by Avery Ching <ac...@apache.org>.

If you're using GiraphJob, the mapper class should be set for you.  
That's weird.

Avery

On 2/7/12 5:58 PM, David Garcia wrote:
> That's interesting.  Yes, I don't need native libraries.  The problem I'm
> having is that after I run job.waitForCompletion(..),
> The job runs a mapper that is something other than GraphMapper.  It
> doesn't complain that a Mapper isn't defined or anything.  It runs
> something else.  As I mentioned below, the map-class doesn't appear to be
> defined.
>
>
> On 2/7/12 7:50 PM, "Jakob Homan"<jg...@gmail.com>  wrote:
>
>> That's not necessarily a bad thing.  Hadoop (not Giraph) has native
>> code library it can use for improved performance.  You'll see this
>> message when running on a cluster that's not been deployed to use the
>> native libraries.  If I follow what you wrote, most likely your work
>> project cluster is so configured.  Unless you actively expect to have
>> the native libraries loaded, I wouldn't be concerned.
>>
>>
>> On Tue, Feb 7, 2012 at 5:46 PM, David Garcia<dg...@potomacfusion.com>
>> wrote:
>>> I am running into a weird error that I haven't seen yet (I suppose I've
>>> been lucky).  I see the following in the logging:
>>>
>>> org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>>
>>>
>>> In the job definition, the property "mapreduce.map.class" is not even
>>> defined.  For Giraph, this is usually set to
>>> "mapreduce.map.class=org.apache.giraph.graph.GraphMapper"
>>>
>>> I'm building my project with hadoop 0.20.204.
>>>
>>> When I build the GiraphProject myself (and run my own tests with the
>>> projects dependencies), I have no problems.  The main difference is that
>>> I'm using a Giraph dependency in my work project.  All input is welcome.
>>> Thx!!
>>>
>>> -David
>>>

Re: running job with giraph dependency anomaly

Posted by David Garcia <dg...@potomacfusion.com>.

That's interesting.  Yes, I don't need native libraries.  The problem I'm
having is that after I run job.waitForCompletion(..),
The job runs a mapper that is something other than GraphMapper.  It
doesn't complain that a Mapper isn't defined or anything.  It runs
something else.  As I mentioned below, the map-class doesn't appear to be
defined.


On 2/7/12 7:50 PM, "Jakob Homan" <jg...@gmail.com> wrote:

>That's not necessarily a bad thing.  Hadoop (not Giraph) has native
>code library it can use for improved performance.  You'll see this
>message when running on a cluster that's not been deployed to use the
>native libraries.  If I follow what you wrote, most likely your work
>project cluster is so configured.  Unless you actively expect to have
>the native libraries loaded, I wouldn't be concerned.
>
>
>On Tue, Feb 7, 2012 at 5:46 PM, David Garcia <dg...@potomacfusion.com>
>wrote:
>> I am running into a weird error that I haven't seen yet (I suppose I've
>> been lucky).  I see the following in the logging:
>>
>> org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>>
>>
>> In the job definition, the property "mapreduce.map.class" is not even
>> defined.  For Giraph, this is usually set to
>> "mapreduce.map.class=org.apache.giraph.graph.GraphMapper"
>>
>> I'm building my project with hadoop 0.20.204.
>>
>> When I build the GiraphProject myself (and run my own tests with the
>> projects dependencies), I have no problems.  The main difference is that
>> I'm using a Giraph dependency in my work project.  All input is welcome.
>> Thx!!
>>
>> -David
>>

Re: running job with giraph dependency anomaly

Posted by Jakob Homan <jg...@gmail.com>.

That's not necessarily a bad thing.  Hadoop (not Giraph) has native
code library it can use for improved performance.  You'll see this
message when running on a cluster that's not been deployed to use the
native libraries.  If I follow what you wrote, most likely your work
project cluster is so configured.  Unless you actively expect to have
the native libraries loaded, I wouldn't be concerned.

On Tue, Feb 7, 2012 at 5:46 PM, David Garcia <dg...@potomacfusion.com> wrote:
> I am running into a weird error that I haven't seen yet (I suppose I've
> been lucky).  I see the following in the logging:
>
> org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
>
> In the job definition, the property "mapreduce.map.class" is not even
> defined.  For Giraph, this is usually set to
> "mapreduce.map.class=org.apache.giraph.graph.GraphMapper"
>
> I'm building my project with hadoop 0.20.204.
>
> When I build the GiraphProject myself (and run my own tests with the
> projects dependencies), I have no problems.  The main difference is that
> I'm using a Giraph dependency in my work project.  All input is welcome.
> Thx!!
>
> -David
>

Re: Giraph Architecture bug in

Posted by Avery Ching <ac...@apache.org>.

AFAIK we don't have any SOP for opening issues.  Maybe I'll take a crack 
at this one tonight if I find some time, unless you were planning to 
work on it David.

Avery

On 2/8/12 5:46 PM, David Garcia wrote:
> I opened up
>
> * GIRAPH-144<https://issues.apache.org/jira/browse/GIRAPH-144>
>
>
> I apologize if I didn't do it up according to project SOP's.  I haven't
> had time to read it thoroughly.
>
> -David
>
>
> On 2/8/12 7:29 PM, "David Garcia"<dg...@potomacfusion.com>  wrote:
>
>> Yeah, I'll write something up.
>>
>>
>> On 2/8/12 7:26 PM, "Avery Ching"<ac...@apache.org>  wrote:
>>
>>> Since we call waitForCompletion() (which calls submit() internally) in
>>> GiraphJob#run(), we cannot override those methods.  A better fix would
>>> probably be to use composition rather than inheritance (i.e.
>>>
>>> public class GiraphJob {
>>>      Job internalJob;
>>> }
>>>
>>> and expose the methods we would like as necessary.  There are other
>>> methods we don't want the user to call, (i.e. setMapperClass(), etc.).
>>> David, can you please open an issue for this?
>>>
>>> Avery
>>>
>>> On 2/8/12 5:17 PM, David Garcia wrote:
>>>> This is a very subtle bug.  GiraphJob inherits from
>>>> org.apache.mapreduce.Job.  However, the methods submit() and
>>>> waitForCompletion() are not overridden.  I assumed that they were
>>>> implemented, so when I called either one of these methods, the
>>>> framework
>>>> started up identity mappers/reducers.  A simple fix is to throw
>>>> unsupported operation exceptions or to implement these methods.
>>>> Perhaps
>>>> this has been done already?
>>>>
>>>> -David
>>>>
>>>> On 2/7/12 7:46 PM, "David Garcia"<dg...@potomacfusion.com>   wrote:
>>>>
>>>>> I am running into a weird error that I haven't seen yet (I suppose
>>>>> I've
>>>>> been lucky).  I see the following in the logging:
>>>>>
>>>>> org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
>>>>> library for your platform... using builtin-java classes where
>>>>> applicable
>>>>>
>>>>>
>>>>> In the job definition, the property "mapreduce.map.class" is not even
>>>>> defined.  For Giraph, this is usually set to
>>>>> "mapreduce.map.class=org.apache.giraph.graph.GraphMapper"
>>>>>
>>>>> I'm building my project with hadoop 0.20.204.
>>>>>
>>>>> When I build the GiraphProject myself (and run my own tests with the
>>>>> projects dependencies), I have no problems.  The main difference is
>>>>> that
>>>>> I'm using a Giraph dependency in my work project.  All input is
>>>>> welcome.
>>>>> Thx!!
>>>>>
>>>>> -David
>>>>>

Re: Giraph Architecture bug in

Posted by David Garcia <dg...@potomacfusion.com>.

I opened up 

* GIRAPH-144 <https://issues.apache.org/jira/browse/GIRAPH-144>


I apologize if I didn't do it up according to project SOP's.  I haven't
had time to read it thoroughly.

-David


On 2/8/12 7:29 PM, "David Garcia" <dg...@potomacfusion.com> wrote:

>Yeah, I'll write something up.
>
>
>On 2/8/12 7:26 PM, "Avery Ching" <ac...@apache.org> wrote:
>
>>Since we call waitForCompletion() (which calls submit() internally) in
>>GiraphJob#run(), we cannot override those methods.  A better fix would
>>probably be to use composition rather than inheritance (i.e.
>>
>>public class GiraphJob {
>>     Job internalJob;
>>}
>>
>>and expose the methods we would like as necessary.  There are other
>>methods we don't want the user to call, (i.e. setMapperClass(), etc.).
>>David, can you please open an issue for this?
>>
>>Avery
>>
>>On 2/8/12 5:17 PM, David Garcia wrote:
>>> This is a very subtle bug.  GiraphJob inherits from
>>> org.apache.mapreduce.Job.  However, the methods submit() and
>>> waitForCompletion() are not overridden.  I assumed that they were
>>> implemented, so when I called either one of these methods, the
>>>framework
>>> started up identity mappers/reducers.  A simple fix is to throw
>>> unsupported operation exceptions or to implement these methods.
>>>Perhaps
>>> this has been done already?
>>>
>>> -David
>>>
>>> On 2/7/12 7:46 PM, "David Garcia"<dg...@potomacfusion.com>  wrote:
>>>
>>>> I am running into a weird error that I haven't seen yet (I suppose
>>>>I've
>>>> been lucky).  I see the following in the logging:
>>>>
>>>> org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where
>>>>applicable
>>>>
>>>>
>>>> In the job definition, the property "mapreduce.map.class" is not even
>>>> defined.  For Giraph, this is usually set to
>>>> "mapreduce.map.class=org.apache.giraph.graph.GraphMapper"
>>>>
>>>> I'm building my project with hadoop 0.20.204.
>>>>
>>>> When I build the GiraphProject myself (and run my own tests with the
>>>> projects dependencies), I have no problems.  The main difference is
>>>>that
>>>> I'm using a Giraph dependency in my work project.  All input is
>>>>welcome.
>>>> Thx!!
>>>>
>>>> -David
>>>>
>>
>

Re: Giraph Architecture bug in

Posted by David Garcia <dg...@potomacfusion.com>.

Yeah, I'll write something up.


On 2/8/12 7:26 PM, "Avery Ching" <ac...@apache.org> wrote:

>Since we call waitForCompletion() (which calls submit() internally) in
>GiraphJob#run(), we cannot override those methods.  A better fix would
>probably be to use composition rather than inheritance (i.e.
>
>public class GiraphJob {
>     Job internalJob;
>}
>
>and expose the methods we would like as necessary.  There are other
>methods we don't want the user to call, (i.e. setMapperClass(), etc.).
>David, can you please open an issue for this?
>
>Avery
>
>On 2/8/12 5:17 PM, David Garcia wrote:
>> This is a very subtle bug.  GiraphJob inherits from
>> org.apache.mapreduce.Job.  However, the methods submit() and
>> waitForCompletion() are not overridden.  I assumed that they were
>> implemented, so when I called either one of these methods, the framework
>> started up identity mappers/reducers.  A simple fix is to throw
>> unsupported operation exceptions or to implement these methods.  Perhaps
>> this has been done already?
>>
>> -David
>>
>> On 2/7/12 7:46 PM, "David Garcia"<dg...@potomacfusion.com>  wrote:
>>
>>> I am running into a weird error that I haven't seen yet (I suppose I've
>>> been lucky).  I see the following in the logging:
>>>
>>> org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where
>>>applicable
>>>
>>>
>>> In the job definition, the property "mapreduce.map.class" is not even
>>> defined.  For Giraph, this is usually set to
>>> "mapreduce.map.class=org.apache.giraph.graph.GraphMapper"
>>>
>>> I'm building my project with hadoop 0.20.204.
>>>
>>> When I build the GiraphProject myself (and run my own tests with the
>>> projects dependencies), I have no problems.  The main difference is
>>>that
>>> I'm using a Giraph dependency in my work project.  All input is
>>>welcome.
>>> Thx!!
>>>
>>> -David
>>>
>

Re: Giraph Architecture bug in

Posted by Avery Ching <ac...@apache.org>.

Since we call waitForCompletion() (which calls submit() internally) in 
GiraphJob#run(), we cannot override those methods.  A better fix would 
probably be to use composition rather than inheritance (i.e.

public class GiraphJob {
     Job internalJob;
}

and expose the methods we would like as necessary.  There are other 
methods we don't want the user to call, (i.e. setMapperClass(), etc.).  
David, can you please open an issue for this?

Avery

On 2/8/12 5:17 PM, David Garcia wrote:
> This is a very subtle bug.  GiraphJob inherits from
> org.apache.mapreduce.Job.  However, the methods submit() and
> waitForCompletion() are not overridden.  I assumed that they were
> implemented, so when I called either one of these methods, the framework
> started up identity mappers/reducers.  A simple fix is to throw
> unsupported operation exceptions or to implement these methods.  Perhaps
> this has been done already?
>
> -David
>
> On 2/7/12 7:46 PM, "David Garcia"<dg...@potomacfusion.com>  wrote:
>
>> I am running into a weird error that I haven't seen yet (I suppose I've
>> been lucky).  I see the following in the logging:
>>
>> org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>>
>>
>> In the job definition, the property "mapreduce.map.class" is not even
>> defined.  For Giraph, this is usually set to
>> "mapreduce.map.class=org.apache.giraph.graph.GraphMapper"
>>
>> I'm building my project with hadoop 0.20.204.
>>
>> When I build the GiraphProject myself (and run my own tests with the
>> projects dependencies), I have no problems.  The main difference is that
>> I'm using a Giraph dependency in my work project.  All input is welcome.
>> Thx!!
>>
>> -David
>>

Giraph Architecture bug in

Posted by David Garcia <dg...@potomacfusion.com>.

This is a very subtle bug.  GiraphJob inherits from
org.apache.mapreduce.Job.  However, the methods submit() and
waitForCompletion() are not overridden.  I assumed that they were
implemented, so when I called either one of these methods, the framework
started up identity mappers/reducers.  A simple fix is to throw
unsupported operation exceptions or to implement these methods.  Perhaps
this has been done already?

-David

On 2/7/12 7:46 PM, "David Garcia" <dg...@potomacfusion.com> wrote:

>I am running into a weird error that I haven't seen yet (I suppose I've
>been lucky).  I see the following in the logging:
>
>org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
>library for your platform... using builtin-java classes where applicable
>
>
>In the job definition, the property "mapreduce.map.class" is not even
>defined.  For Giraph, this is usually set to
>"mapreduce.map.class=org.apache.giraph.graph.GraphMapper"
>
>I'm building my project with hadoop 0.20.204.
>
>When I build the GiraphProject myself (and run my own tests with the
>projects dependencies), I have no problems.  The main difference is that
>I'm using a Giraph dependency in my work project.  All input is welcome.
>Thx!!
>
>-David
>

running job with giraph dependency anomaly

Posted by David Garcia <dg...@potomacfusion.com>.

I am running into a weird error that I haven't seen yet (I suppose I've
been lucky).  I see the following in the logging:

org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable


In the job definition, the property "mapreduce.map.class" is not even
defined.  For Giraph, this is usually set to
"mapreduce.map.class=org.apache.giraph.graph.GraphMapper"

I'm building my project with hadoop 0.20.204.

When I build the GiraphProject myself (and run my own tests with the
projects dependencies), I have no problems.  The main difference is that
I'm using a Giraph dependency in my work project.  All input is welcome.
Thx!!

-David

Re: the slides for my talk @ FOSDEM

Posted by Claudio Martella <cl...@gmail.com>.

Hi Sebastian,

thanks for your feedback on the slides.

As a matter of fact I'm aware of the pegasus matrix-based optimization
as of the shimmy technique by Jimmy Lin. I thought that this kind of
technique is general enough for all iterative graph algorithms, not
just PR, and mostly using the naive algorithm would just help me out
explaining the presentation. Messaging the adjacent vertices from the
Mapper by iterating over them and emitting (otherVertex, myPartialPR)
maps easily to our messaging paradigm. I'll maybe make it more clear
in the next presentation.

Thanks!

On Mon, Feb 6, 2012 at 2:54 PM, Sebastian Schelter <ss...@apache.org> wrote:
> Hi Claudio,
>
> nice job with the slides! I have only one small point to criticize:
>
> When PageRank is implemented with MapReduce, it's not necessary to have
> the graph passed through in each iteration. Mahout for example uses
> power iterations where the adjacency matrix is multiplied by the
> pagerank vector and only that vector has to be sent over the network.
> Pegasus uses a similar approach.
>
> /s
>
>
>
> On 06.02.2012 12:24, Claudio Martella wrote:
>> Hello guys,
>>
>> for those interested, here are the "slides" for my talk at FOSDEM.
>>
>> http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
>>
>> The event was very nice, a tight community and a great interest in
>> Giraph. Isabel Drost, one of the organizers of Berlin Buzzwords,
>> invited the talk there. Jakob, are you still planning to talk there?
>> Maybe we can split Kafka/Giraph talks?
>>
>> Best,
>> Claudio
>>
>

-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: the slides for my talk @ FOSDEM

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Claudio,

nice job with the slides! I have only one small point to criticize:

When PageRank is implemented with MapReduce, it's not necessary to have
the graph passed through in each iteration. Mahout for example uses
power iterations where the adjacency matrix is multiplied by the
pagerank vector and only that vector has to be sent over the network.
Pegasus uses a similar approach.

/s

On 06.02.2012 12:24, Claudio Martella wrote:
> Hello guys,
> 
> for those interested, here are the "slides" for my talk at FOSDEM.
> 
> http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
> 
> The event was very nice, a tight community and a great interest in
> Giraph. Isabel Drost, one of the organizers of Berlin Buzzwords,
> invited the talk there. Jakob, are you still planning to talk there?
> Maybe we can split Kafka/Giraph talks?
> 
> Best,
> Claudio
>