You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Jyoti Yadav <ra...@gmail.com> on 2014/02/01 19:14:02 UTC

Re: constraint about no of supersteps

Hi Claudio..

As I mentioned the error while running the giraph job with checkpointing
feature on  in previous posts,i could fix one of the  errors
as mentioned below

Task Id : attempt_201401310947_0001_m_
000001_0, Status : FAILED
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/hduser/_bsp/_checkpoints/job_201401310947_0001/4.kanha-Vostro-1014_1.metadata
could only be replicated to 0 nodes, instead of 1


Then again I executed the giraph job,this time it failed with dumping the
following error...





14/02/01 23:12:33 INFO job.GiraphJob: run: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201402012227_0003
14/02/01 23:12:58 INFO
job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
writeHaltInstructions: To halt after next superstep execute:
'bin/halt-application --zkServer kanha-Vostro-1014:22181 --zkNode
/_hadoopBsp/job_201402012227_0003/_haltComputation'
14/02/01 23:12:58 INFO mapred.JobClient: Running job: job_201402012227_0003
14/02/01 23:12:59 INFO mapred.JobClient:  map 50% reduce 0%
14/02/01 23:13:02 INFO mapred.JobClient:  map 100% reduce 0%
14/02/01 23:13:30 INFO mapred.JobClient:  map 50% reduce 0%
14/02/01 23:13:38 INFO mapred.JobClient: Task Id :
attempt_201402012227_0003_m_000000_0, Status : FAILED
java.lang.Throwable: Child Error
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

attempt_201402012227_0003_m_000000_0: SLF4J: Class path contains multiple
SLF4J bindings.
attempt_201402012227_0003_m_000000_0: SLF4J: Found binding in
[file:/app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201402012227_0003/jars/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201402012227_0003_m_000000_0: SLF4J: Found binding in
[jar:file:/usr/local/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201402012227_0003_m_000000_0: SLF4J: See
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
attempt_201402012227_0003_m_000000_0: SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory]
14/02/01 23:13:54 INFO mapred.JobClient:  map 100% reduce 0%
14/02/01 23:23:48 INFO mapred.JobClient: Task Id :
attempt_201402012227_0003_m_000001_0, Status : FAILED
java.lang.IllegalStateException: run: Caught an unrecoverable exception
createExt: Failed to create
/_hadoopBsp/job_201402012227_0003/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir/kanha-Vostro-1014_1
after 3 tries!
    at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: createExt: Failed to create
/_hadoopBsp/job_201402012227_0003/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir/kanha-Vostro-1014_1
after 3 tries!
    at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:182)
    at
org.apache.giraph.worker.BspServiceWorker.writeFinshedSuperstepInfoToZK(BspServiceWorker.java:899)
    at
org.apache.giraph.worker.BspServiceWorker.finishSuperstep(BspServiceWorker.java:769)
    at
org.apache.giraph.graph.GraphTaskManager.completeSuperstepAndCollectStats(GraphTaskManager.java:398)
    at
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:289)
    at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
    ... 7 more

Task attempt_201402012227_0003_m_000001_0 failed to report status for 601
seconds. Killing!
attempt_201402012227_0003_m_000001_0: SLF4J: Class path contains multiple
SLF4J bindings.
attempt_201402012227_0003_m_000001_0: SLF4J: Found binding in
[file:/app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201402012227_0003/jars/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201402012227_0003_m_000001_0: SLF4J: Found binding in
[jar:file:/usr/local/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201402012227_0003_m_000001_0: SLF4J: See
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
attempt_201402012227_0003_m_000001_0: SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory]
attempt_201402012227_0003_m_000001_0: log4j:WARN No appenders could be
found for logger (org.apache.zookeeper.ClientCnxn).
attempt_201402012227_0003_m_000001_0: log4j:WARN Please initialize the
log4j system properly.
14/02/01 23:23:49 INFO mapred.JobClient:  map 50% reduce 0%
14/02/01 23:24:03 INFO mapred.JobClient: Job complete: job_201402012227_0003
14/02/01 23:24:03 INFO mapred.JobClient: Counters: 5
14/02/01 23:24:03 INFO mapred.JobClient:   Job Counters
14/02/01 23:24:03 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1295731
14/02/01 23:24:03 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
14/02/01 23:24:03 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
14/02/01 23:24:03 INFO mapred.JobClient:     Launched map tasks=4
14/02/01 23:24:03 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0



Seeking your suggestion..

Jyoti



On Fri, Jan 31, 2014 at 12:41 PM, Jyoti Yadav <ra...@gmail.com>wrote:

> Thanks Claudio for your reply..
> I think it is the problem due to less hard disk space.
> /app/hadoop/tmp/dfs/name/data   this directory is almost full..
>
> Should i format my namenode??  Will it create any problem??
> I know if i format ,i will lose all my data residing in hdfs.
> Before formatting it,i will take backup of all the input files used to run
> giraph job..
>
> Seeking your suggestions..
> Thanks
>
>
> On Fri, Jan 31, 2014 at 10:47 AM, Claudio Martella <
> claudio.martella@gmail.com> wrote:
>
>>
>> On Fri, Jan 31, 2014 at 5:58 AM, Jyoti Yadav <ra...@gmail.com>wrote:
>>
>>> could only be replicated to 0 nodes, instead of 1
>>
>>
>> this is not a problem related to giraph, but to hdfs. please see
>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>
>>
>> --
>>    Claudio Martella
>>
>>
>
>