You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@giraph.apache.org by Avery Ching <ac...@apache.org> on 2012/04/01 00:32:26 UTC

Re: Incomplete output when running PageRank example

As Benjamin mentioned, it depends on the number of map tasks your hadoop 
install is running with.  You could set it proportionally to the number 
of cores it has if you like, but try using Benjamin's suggestions to get 
it working with more map tasks.  I believe if you don't set the default, 
the default is 2, which is not enough for 2 workers.

Avery

On 3/31/12 11:51 AM, Robert Davis wrote:
> Thanks a lot, Benjamin.
>
> I set the number of maptask as 2 since I only have a duo-core 
> processor (though with hyperthread) on my laptop. I ran it again but 
> it still appeared incorrect. The output is as follows.
>
> Regards,
> Robert
>
> $ hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50000000 
> -w 2
> 12/03/31 11:40:08 INFO benchmark.PageRankBenchmark: Using class 
> org.apache.giraph.benchmark.HashMapVertexPageRankBenchmark
> 12/03/31 11:40:10 WARN bsp.BspOutputFormat: checkOutputSpecs: 
> ImmutableOutputCommiter will not check anything
> 12/03/31 11:40:11 INFO mapred.JobClient: Running job: 
> job_201203301834_0004
> 12/03/31 11:40:12 INFO mapred.JobClient:  map 0% reduce 0%
> 12/03/31 11:40:38 INFO mapred.JobClient:  map 33% reduce 0%
> 12/03/31 11:45:44 INFO mapred.JobClient: Job complete: 
> job_201203301834_0004
> 12/03/31 11:45:44 INFO mapred.JobClient: Counters: 5
> 12/03/31 11:45:44 INFO mapred.JobClient:   Job Counters
> 12/03/31 11:45:44 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=620769
> 12/03/31 11:45:44 INFO mapred.JobClient:     Total time spent by all 
> reduces waiting after reserving slots (ms)=0
> 12/03/31 11:45:44 INFO mapred.JobClient:     Total time spent by all 
> maps waiting after reserving slots (ms)=0
> 12/03/31 11:45:44 INFO mapred.JobClient:     Launched map tasks=2
> 12/03/31 11:45:44 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=4377
>
> On Sat, Mar 31, 2012 at 3:45 AM, Benjamin Heitmann 
> <benjamin.heitmann@deri.org <ma...@deri.org>> wrote:
>
>
>     Hi Robert,
>
>     On 31 Mar 2012, at 09:42, Robert Davis wrote:
>
>     > Hello Giraphers,
>     >
>     > I am new to Giraph. I just check out a version and ran it in the
>     single
>     > machine mode. I got the following results which has no Giraph
>     counter
>     > information (as those in the example output). I am wondering
>     what has gone
>     > wrong. The hadoop I am using is 1.0
>
>     it looks like your Giraph job did not actually finish the calculation.
>
>     As you say that you are new to Giraph, there might be a high
>     chance that you ran into the same issue which tripped me up a few
>     weeks ago ;)
>
>     (I am not sure where the following information should be documented,
>     maybe this issue should be documented on the same page which
>     describes how to run the pagerank benchmark)
>
>     You provide the parameter "-w 30" to your job, which means that it
>     will use 30 workers. Maybe thats from the example on the Giraph
>     web page,
>     however there is one very important caveat for the number of workers:
>     the number of workers needs to be smaller then
>     mapred.tasktracker.map.tasks.maximum minus one.
>
>     Giraph will use one mapper task to start some sort of coordinating
>     worker (probably something zookeeper specific),
>     and then it will start the number of workers which you specified
>     using -w . If the total number of workers is bigger then the
>     maximum number of tasks,
>     then your Giraph job will not finish actually calculating stuff.
>     (There might be a config option for specifying how many workers
>     need to be finished in order to start the next superstep,
>     but I did not try that personally.)
>
>     If you are running Hadoop/Giraph on your personal machine, then I
>     would recommend, using 3 workers, and you should edit your
>     conf/mapred-site.xml
>     to include some values for the following configuration parameters
>     (and restart hadoop...)
>
>     <property>
>     <name>mapred.map.tasks</name>
>     <value>4</value>
>     </property>
>     <property>
>     <name>mapred.reduce.tasks</name>
>     <value>4</value>
>     </property>
>     <property>
>     <name>mapred.tasktracker.map.tasks.maximum</name>
>     <value>4</value>
>     </property>
>     <property>
>     <name>mapred.tasktracker.reduce.tasks.maximum</name>
>     <value>4</value>
>     </property>
>
>
>

Re: Incomplete output when running PageRank example

Posted by Robert Davis <dw...@gmail.com>.

Thanks Avery. I've solved the problem by setting #workers to 1.

Robert

On Sat, Mar 31, 2012 at 3:32 PM, Avery Ching <ac...@apache.org> wrote:

>  As Benjamin mentioned, it depends on the number of map tasks your hadoop
> install is running with.  You could set it proportionally to the number of
> cores it has if you like, but try using Benjamin's suggestions to get it
> working with more map tasks.  I believe if you don't set the default, the
> default is 2, which is not enough for 2 workers.
>
> Avery
>
>
> On 3/31/12 11:51 AM, Robert Davis wrote:
>
> Thanks a lot, Benjamin.
>
> I set the number of maptask as 2 since I only have a duo-core processor
> (though with hyperthread) on my laptop. I ran it again but it still
> appeared incorrect. The output is as follows.
>
> Regards,
> Robert
>
> $ hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50000000 -w 2
> 12/03/31 11:40:08 INFO benchmark.PageRankBenchmark: Using class
> org.apache.giraph.benchmark.HashMapVertexPageRankBenchmark
> 12/03/31 11:40:10 WARN bsp.BspOutputFormat: checkOutputSpecs:
> ImmutableOutputCommiter will not check anything
> 12/03/31 11:40:11 INFO mapred.JobClient: Running job: job_201203301834_0004
> 12/03/31 11:40:12 INFO mapred.JobClient:  map 0% reduce 0%
> 12/03/31 11:40:38 INFO mapred.JobClient:  map 33% reduce 0%
> 12/03/31 11:45:44 INFO mapred.JobClient: Job complete:
> job_201203301834_0004
> 12/03/31 11:45:44 INFO mapred.JobClient: Counters: 5
> 12/03/31 11:45:44 INFO mapred.JobClient:   Job Counters
> 12/03/31 11:45:44 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=620769
> 12/03/31 11:45:44 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 12/03/31 11:45:44 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 12/03/31 11:45:44 INFO mapred.JobClient:     Launched map tasks=2
> 12/03/31 11:45:44 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=4377
>
> On Sat, Mar 31, 2012 at 3:45 AM, Benjamin Heitmann <
> benjamin.heitmann@deri.org> wrote:
>
>>
>> Hi Robert,
>>
>> On 31 Mar 2012, at 09:42, Robert Davis wrote:
>>
>> > Hello Giraphers,
>> >
>> > I am new to Giraph. I just check out a version and ran it in the single
>> > machine mode. I got the following results which has no Giraph counter
>> > information (as those in the example output). I am wondering what has
>> gone
>> > wrong. The hadoop I am using is 1.0
>>
>>  it looks like your Giraph job did not actually finish the calculation.
>>
>> As you say that you are new to Giraph, there might be a high chance that
>> you ran into the same issue which tripped me up a few weeks ago ;)
>>
>> (I am not sure where the following information should be documented,
>> maybe this issue should be documented on the same page which describes
>> how to run the pagerank benchmark)
>>
>> You provide the parameter "-w 30" to your job, which means that it will
>> use 30 workers. Maybe thats from the example on the Giraph web page,
>> however there is one very important caveat for the number of workers:
>> the number of workers needs to be smaller then
>> mapred.tasktracker.map.tasks.maximum minus one.
>>
>> Giraph will use one mapper task to start some sort of coordinating worker
>> (probably something zookeeper specific),
>> and then it will start the number of workers which you specified using -w
>> . If the total number of workers is bigger then the maximum number of tasks,
>> then your Giraph job will not finish actually calculating stuff.
>> (There might be a config option for specifying how many workers need to
>> be finished in order to start the next superstep,
>> but I did not try that personally.)
>>
>> If you are running Hadoop/Giraph on your personal machine, then I would
>> recommend, using 3 workers, and you should edit your conf/mapred-site.xml
>> to include some values for the following configuration parameters (and
>> restart hadoop...)
>>
>>  <property>
>>    <name>mapred.map.tasks</name>
>>    <value>4</value>
>>  </property>
>>  <property>
>>    <name>mapred.reduce.tasks</name>
>>    <value>4</value>
>>  </property>
>>  <property>
>>    <name>mapred.tasktracker.map.tasks.maximum</name>
>>    <value>4</value>
>>  </property>
>>  <property>
>>    <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>    <value>4</value>
>>  </property>
>>
>>
>>
>
>