You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Anirudh Perugu <an...@stonybrook.edu> on 2016/03/19 23:56:47 UTC

Settings : number of workers, heap size. jps showing java tasks even after killing application.

Hello All,

I am a giraph newbie, so kindly bear with me. I am trying to run BFS on a
graph which has : 28048 edges and 786 nodes.

*Here is my Setup :*
Single Node Cluster, 8GB RAM, 4 Cores, Apache Yarn 2.7.2, Giraph 1.2.0, my
single machine has everything(yarn+giraph) running on it.

*1. How many workers can I have?*
I ask this because my giraph job runs fine with settings :* -w 1 -ca
giraph.SplitMasterWorker=false*

"Testing Results Table"

*no. of workers | maximum no. of containers used | time taken for
completion*
1                          3 (as seen on the UI)                         0
min 50 secs
2                          4
                                                    1 min 18 secs
3                          4
          Long Running Job. I think it times out after 20 minutes.

for 3 workers, this is the log :


*INFO server.PrepRequestProcessor: Got user-level KeeperException when
processing sessionid:0x153910879aa0000 type:create cxid:0x1 zxid:0x2
txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0004/_masterElectionDir
Error:KeeperErrorCode = NoNode for
/_hadoopBsp/giraph_yarn_application_1458425066569_0004/_masterElectionDir *
So, this fails due to a reason I still haven't figured out, can you answer
this ?

Finally, leads me to asking how many workers can I have if I have 4 cores
on my machine? Are # of cores and # of workers related?


*2. How do I use the setting : giraph.yarn.task.heap.mb=x? I set x to 2048
but my job runs indefinitely(hangs up). Works great with default setting of
1024.*


*Job takes forever when x=2048, userlogs say :*
6/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/home//Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/var/folders/s4/n58qlsh97t11vkmhysts8k680000gn/T/
16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client environment:os.name=Mac
OS X
16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
environment:os.arch=x86_64
16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
environment:os.version=10.11.3
16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client environment:user.name
=Anirudh
16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
environment:user.home=/Users/Anirudh
16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/private/tmp/hadoop-Anirudh/nm-local-dir/usercache/Anirudh/appcache/application_1458425066569_0001/container_1458425066569_0001_01_000002
16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=172.24.18.199:22181 sessionTimeout=60000
watcher=org.apache.giraph.master.BspServiceMaster@4097cac
16/03/19 18:05:36 INFO zookeeper.ClientCnxn: Opening socket connection to
server 172.24.18.199/172.24.18.199:22181. Will not attempt to authenticate
using SASL (unknown error)
16/03/19 18:05:36 INFO server.NIOServerCnxnFactory: Accepted socket
connection from /172.24.18.199:61815
16/03/19 18:05:36 INFO zookeeper.ClientCnxn: Socket connection established
to 172.24.18.199/172.24.18.199:22181, initiating session
16/03/19 18:05:36 INFO server.ZooKeeperServer: Client attempting to
establish new session at /172.24.18.199:61815
16/03/19 18:05:36 INFO persistence.FileTxnLog: Creating new log file: log.1
16/03/19 18:05:36 INFO server.ZooKeeperServer: Established session
0x15390e97a2a0000 with negotiated timeout 600000 for client /
172.24.18.199:61815
16/03/19 18:05:36 INFO zookeeper.ClientCnxn: Session establishment complete
on server 172.24.18.199/172.24.18.199:22181, sessionid = 0x15390e97a2a0000,
negotiated timeout = 600000
16/03/19 18:05:36 INFO bsp.BspService: process: Asynchronous connection
complete.
16/03/19 18:05:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
MASTER_ZOOKEEPER_ONLY starting...
16/03/19 18:05:36 INFO graph.GraphTaskManager: map: No need to do anything
when not a worker
16/03/19 18:05:36 INFO graph.GraphTaskManager: cleanup: Starting for
MASTER_ZOOKEEPER_ONLY
16/03/19 18:05:36 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x1 zxid:0x2 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir
Error:KeeperErrorCode = NoNode for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir
16/03/19 18:05:36 INFO master.BspServiceMaster: becomeMaster: First child
is
'/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir/172.24.18.199_00000000000'
and my bid is
'/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir/172.24.18.199_00000000000'
16/03/19 18:05:36 INFO netty.NettyServer: NettyServer: Using execution
group with 8 threads for requestFrameDecoder.
16/03/19 18:05:36 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
16/03/19 18:05:36 INFO netty.NettyServer: start: Started server
communication server: /172.24.18.199:30000 with up to 16 threads on bind
attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288
16/03/19 18:05:36 INFO netty.NettyClient: NettyClient: Using execution
handler with 8 threads after request-encoder.
16/03/19 18:05:36 INFO master.BspServiceMaster: becomeMaster: I am now the
master!
16/03/19 18:05:36 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0xe zxid:0x9 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0
Error:KeeperErrorCode = NoNode for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0
16/03/19 18:05:36 INFO bsp.BspService: process: applicationAttemptChanged
signaled
16/03/19 18:05:36 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x16 zxid:0xc txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1
Error:KeeperErrorCode = NoNode for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1
16/03/19 18:05:36 WARN bsp.BspService: process: Unknown and unprocessed
event
(path=/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir,
type=NodeChildrenChanged, state=SyncConnected)
16/03/19 18:05:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
start superstep -1
16/03/19 18:06:06 INFO master.BspServiceMaster: checkWorkers: Only found 0
responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
569971 more msecs left before giving up.
16/03/19 18:06:06 INFO master.BspServiceMaster:
logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
16/03/19 18:06:06 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x22 zxid:0x10 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
16/03/19 18:06:06 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x23 zxid:0x11 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
16/03/19 18:06:06 INFO yarn.GiraphYarnTask: [STATUS: task-0]
MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
start superstep -1
16/03/19 18:06:36 INFO master.BspServiceMaster: checkWorkers: Only found 0
responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
539950 more msecs left before giving up.
16/03/19 18:06:36 INFO master.BspServiceMaster:
logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
16/03/19 18:06:36 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x26 zxid:0x12 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
16/03/19 18:06:36 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x27 zxid:0x13 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
16/03/19 18:06:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
start superstep -1
16/03/19 18:07:06 INFO master.BspServiceMaster: checkWorkers: Only found 0
responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
509938 more msecs left before giving up.
16/03/19 18:07:06 INFO master.BspServiceMaster:
logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
16/03/19 18:07:06 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x2a zxid:0x14 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
16/03/19 18:07:06 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x2b zxid:0x15 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
16/03/19 18:07:06 INFO yarn.GiraphYarnTask: [STATUS: task-0]
MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
start superstep -1
16/03/19 18:07:36 INFO master.BspServiceMaster: checkWorkers: Only found 0
responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
479927 more msecs left before giving up.
16/03/19 18:07:36 INFO master.BspServiceMaster:
logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
16/03/19 18:07:36 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x2e zxid:0x16 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
16/03/19 18:07:36 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x2f zxid:0x17 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
16/03/19 18:07:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
start superstep -1
16/03/19 18:08:06 INFO master.BspServiceMaster: checkWorkers: Only found 0
responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
449916 more msecs left before giving up.
16/03/19 18:08:06 INFO master.BspServiceMaster:
logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
16/03/19 18:08:06 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x32 zxid:0x18 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
16/03/19 18:08:06 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x15390e97a2a0000 type:create
cxid:0x33 zxid:0x19 txntype:-1 reqpath:n/a Error
Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
Error:KeeperErrorCode = NodeExists for
/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
16/03/19 18:08:06 INFO yarn.GiraphYarnTask: [STATUS: task-0]
MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
start superstep -1
16/03/19 18:08:36 INFO master.BspServiceMaster: checkWorkers: Only found 0
responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
419894 more msecs left before giving up.
----------------------------------------End of
Logs--------------------------------------------------------

Does this mean that it required 1 worker to start the -1th superstep but
did not find any or is it something else?
- If that is the case, I can confirm that I have a node (the only) which is
healthy.

Finally, how do I give more memory to the giraph job?


*3. When I kill a long running job by either killing the application on GUI
or by pressing control+c, why does my jps still show these :*
anirudh:hadoop-2.7.2 Anirudh$ jps


*9009 GiraphYarnTask8962 GiraphApplicationMaster*9267 Jps
8725 NameNode
8855 NodeManager
8764 DataNode
8815 ResourceManager

Aren't they supposed to be killed as well? I ask this because if I run a
new job at this time, the job is forever in ACCEPTED state. Only after
those are killed, a fresh job does go to completion (my observation).

Alright then, I am hoping for a reply which will address these issues.

Thanks
Anirudh

Re: Settings : number of workers, heap size. jps showing java tasks even after killing application.

Posted by Anirudh Perugu <an...@stonybrook.edu>.
Just bumping this thread as I am still looking for answers.

On Sun, Mar 20, 2016 at 11:31 AM, Anirudh Perugu <
anirudh.perugu@stonybrook.edu> wrote:

> Hello all,
>
> This is in follow up to *1. How many workers can I have? *
> So I understand that per-worker parallelism is achieved using compute
> threads. Hence, giraph.numComputeThreads maximum value is limited to # of
> cores. What is the # of workers limited to? (cannot be 1 for my setup as
> job runs successfully with 2).
>
>
> On Sat, Mar 19, 2016 at 6:56 PM, Anirudh Perugu <
> anirudh.perugu@stonybrook.edu> wrote:
>
>> Hello All,
>>
>> I am a giraph newbie, so kindly bear with me. I am trying to run BFS on a
>> graph which has : 28048 edges and 786 nodes.
>>
>> *Here is my Setup :*
>> Single Node Cluster, 8GB RAM, 4 Cores, Apache Yarn 2.7.2, Giraph 1.2.0,
>> my single machine has everything(yarn+giraph) running on it.
>>
>> *1. How many workers can I have?*
>> I ask this because my giraph job runs fine with settings :* -w 1 -ca
>> giraph.SplitMasterWorker=false*
>>
>> "Testing Results Table"
>>
>> *no. of workers | maximum no. of containers used | time taken for
>> completion*
>> 1                          3 (as seen on the UI)
>> 0 min 50 secs
>> 2                          4
>>                                                     1 min 18 secs
>> 3                          4
>>             Long Running Job. I think it times out after 20 minutes.
>>
>> for 3 workers, this is the log :
>>
>>
>> *INFO server.PrepRequestProcessor: Got user-level KeeperException when
>> processing sessionid:0x153910879aa0000 type:create cxid:0x1 zxid:0x2
>> txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0004/_masterElectionDir
>> Error:KeeperErrorCode = NoNode for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0004/_masterElectionDir *
>> So, this fails due to a reason I still haven't figured out, can you
>> answer this ?
>>
>> Finally, leads me to asking how many workers can I have if I have 4 cores
>> on my machine? Are # of cores and # of workers related?
>>
>>
>> *2. How do I use the setting : giraph.yarn.task.heap.mb=x? I set x to
>> 2048 but my job runs indefinitely(hangs up). Works great with default
>> setting of 1024.*
>>
>>
>> *Job takes forever when x=2048, userlogs say :*
>> 6/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
>> environment:java.library.path=/home//Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
>> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
>> environment:java.io.tmpdir=/var/folders/s4/n58qlsh97t11vkmhysts8k680000gn/T/
>> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
>> environment:java.compiler=<NA>
>> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client environment:os.name=Mac
>> OS X
>> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
>> environment:os.arch=x86_64
>> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
>> environment:os.version=10.11.3
>> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client environment:user.name
>> =Anirudh
>> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
>> environment:user.home=/Users/Anirudh
>> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
>> environment:user.dir=/private/tmp/hadoop-Anirudh/nm-local-dir/usercache/Anirudh/appcache/application_1458425066569_0001/container_1458425066569_0001_01_000002
>> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Initiating client connection,
>> connectString=172.24.18.199:22181 sessionTimeout=60000
>> watcher=org.apache.giraph.master.BspServiceMaster@4097cac
>> 16/03/19 18:05:36 INFO zookeeper.ClientCnxn: Opening socket connection to
>> server 172.24.18.199/172.24.18.199:22181. Will not attempt to
>> authenticate using SASL (unknown error)
>> 16/03/19 18:05:36 INFO server.NIOServerCnxnFactory: Accepted socket
>> connection from /172.24.18.199:61815
>> 16/03/19 18:05:36 INFO zookeeper.ClientCnxn: Socket connection
>> established to 172.24.18.199/172.24.18.199:22181, initiating session
>> 16/03/19 18:05:36 INFO server.ZooKeeperServer: Client attempting to
>> establish new session at /172.24.18.199:61815
>> 16/03/19 18:05:36 INFO persistence.FileTxnLog: Creating new log file:
>> log.1
>> 16/03/19 18:05:36 INFO server.ZooKeeperServer: Established session
>> 0x15390e97a2a0000 with negotiated timeout 600000 for client /
>> 172.24.18.199:61815
>> 16/03/19 18:05:36 INFO zookeeper.ClientCnxn: Session establishment
>> complete on server 172.24.18.199/172.24.18.199:22181, sessionid =
>> 0x15390e97a2a0000, negotiated timeout = 600000
>> 16/03/19 18:05:36 INFO bsp.BspService: process: Asynchronous connection
>> complete.
>> 16/03/19 18:05:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
>> MASTER_ZOOKEEPER_ONLY starting...
>> 16/03/19 18:05:36 INFO graph.GraphTaskManager: map: No need to do
>> anything when not a worker
>> 16/03/19 18:05:36 INFO graph.GraphTaskManager: cleanup: Starting for
>> MASTER_ZOOKEEPER_ONLY
>> 16/03/19 18:05:36 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x1 zxid:0x2 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir
>> Error:KeeperErrorCode = NoNode for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir
>> 16/03/19 18:05:36 INFO master.BspServiceMaster: becomeMaster: First child
>> is
>> '/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir/172.24.18.199_00000000000'
>> and my bid is
>> '/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir/172.24.18.199_00000000000'
>> 16/03/19 18:05:36 INFO netty.NettyServer: NettyServer: Using execution
>> group with 8 threads for requestFrameDecoder.
>> 16/03/19 18:05:36 INFO Configuration.deprecation: mapred.map.tasks is
>> deprecated. Instead, use mapreduce.job.maps
>> 16/03/19 18:05:36 INFO netty.NettyServer: start: Started server
>> communication server: /172.24.18.199:30000 with up to 16 threads on bind
>> attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288
>> 16/03/19 18:05:36 INFO netty.NettyClient: NettyClient: Using execution
>> handler with 8 threads after request-encoder.
>> 16/03/19 18:05:36 INFO master.BspServiceMaster: becomeMaster: I am now
>> the master!
>> 16/03/19 18:05:36 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0xe zxid:0x9 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0
>> Error:KeeperErrorCode = NoNode for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0
>> 16/03/19 18:05:36 INFO bsp.BspService: process: applicationAttemptChanged
>> signaled
>> 16/03/19 18:05:36 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x16 zxid:0xc txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1
>> Error:KeeperErrorCode = NoNode for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1
>> 16/03/19 18:05:36 WARN bsp.BspService: process: Unknown and unprocessed
>> event
>> (path=/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir,
>> type=NodeChildrenChanged, state=SyncConnected)
>> 16/03/19 18:05:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
>> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
>> start superstep -1
>> 16/03/19 18:06:06 INFO master.BspServiceMaster: checkWorkers: Only found
>> 0 responses of 1 needed to start superstep -1.  Reporting every 30000
>> msecs, 569971 more msecs left before giving up.
>> 16/03/19 18:06:06 INFO master.BspServiceMaster:
>> logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
>> 16/03/19 18:06:06 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x22 zxid:0x10 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
>> Error:KeeperErrorCode = NodeExists for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
>> 16/03/19 18:06:06 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x23 zxid:0x11 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
>> Error:KeeperErrorCode = NodeExists for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
>> 16/03/19 18:06:06 INFO yarn.GiraphYarnTask: [STATUS: task-0]
>> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
>> start superstep -1
>> 16/03/19 18:06:36 INFO master.BspServiceMaster: checkWorkers: Only found
>> 0 responses of 1 needed to start superstep -1.  Reporting every 30000
>> msecs, 539950 more msecs left before giving up.
>> 16/03/19 18:06:36 INFO master.BspServiceMaster:
>> logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
>> 16/03/19 18:06:36 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x26 zxid:0x12 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
>> Error:KeeperErrorCode = NodeExists for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
>> 16/03/19 18:06:36 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x27 zxid:0x13 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
>> Error:KeeperErrorCode = NodeExists for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
>> 16/03/19 18:06:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
>> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
>> start superstep -1
>> 16/03/19 18:07:06 INFO master.BspServiceMaster: checkWorkers: Only found
>> 0 responses of 1 needed to start superstep -1.  Reporting every 30000
>> msecs, 509938 more msecs left before giving up.
>> 16/03/19 18:07:06 INFO master.BspServiceMaster:
>> logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
>> 16/03/19 18:07:06 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x2a zxid:0x14 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
>> Error:KeeperErrorCode = NodeExists for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
>> 16/03/19 18:07:06 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x2b zxid:0x15 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
>> Error:KeeperErrorCode = NodeExists for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
>> 16/03/19 18:07:06 INFO yarn.GiraphYarnTask: [STATUS: task-0]
>> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
>> start superstep -1
>> 16/03/19 18:07:36 INFO master.BspServiceMaster: checkWorkers: Only found
>> 0 responses of 1 needed to start superstep -1.  Reporting every 30000
>> msecs, 479927 more msecs left before giving up.
>> 16/03/19 18:07:36 INFO master.BspServiceMaster:
>> logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
>> 16/03/19 18:07:36 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x2e zxid:0x16 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
>> Error:KeeperErrorCode = NodeExists for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
>> 16/03/19 18:07:36 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x2f zxid:0x17 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
>> Error:KeeperErrorCode = NodeExists for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
>> 16/03/19 18:07:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
>> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
>> start superstep -1
>> 16/03/19 18:08:06 INFO master.BspServiceMaster: checkWorkers: Only found
>> 0 responses of 1 needed to start superstep -1.  Reporting every 30000
>> msecs, 449916 more msecs left before giving up.
>> 16/03/19 18:08:06 INFO master.BspServiceMaster:
>> logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
>> 16/03/19 18:08:06 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x32 zxid:0x18 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
>> Error:KeeperErrorCode = NodeExists for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
>> 16/03/19 18:08:06 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
>> cxid:0x33 zxid:0x19 txntype:-1 reqpath:n/a Error
>> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
>> Error:KeeperErrorCode = NodeExists for
>> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
>> 16/03/19 18:08:06 INFO yarn.GiraphYarnTask: [STATUS: task-0]
>> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
>> start superstep -1
>> 16/03/19 18:08:36 INFO master.BspServiceMaster: checkWorkers: Only found
>> 0 responses of 1 needed to start superstep -1.  Reporting every 30000
>> msecs, 419894 more msecs left before giving up.
>> ----------------------------------------End of
>> Logs--------------------------------------------------------
>>
>> Does this mean that it required 1 worker to start the -1th superstep but
>> did not find any or is it something else?
>> - If that is the case, I can confirm that I have a node (the only) which
>> is healthy.
>>
>> Finally, how do I give more memory to the giraph job?
>>
>>
>> *3. When I kill a long running job by either killing the application on
>> GUI or by pressing control+c, why does my jps still show these :*
>> anirudh:hadoop-2.7.2 Anirudh$ jps
>>
>>
>> *9009 GiraphYarnTask8962 GiraphApplicationMaster*9267 Jps
>> 8725 NameNode
>> 8855 NodeManager
>> 8764 DataNode
>> 8815 ResourceManager
>>
>> Aren't they supposed to be killed as well? I ask this because if I run a
>> new job at this time, the job is forever in ACCEPTED state. Only after
>> those are killed, a fresh job does go to completion (my observation).
>>
>> Alright then, I am hoping for a reply which will address these issues.
>>
>> Thanks
>> Anirudh
>>
>
>

Re: Settings : number of workers, heap size. jps showing java tasks even after killing application.

Posted by Anirudh Perugu <an...@stonybrook.edu>.
Hello all,

This is in follow up to *1. How many workers can I have? *
So I understand that per-worker parallelism is achieved using compute
threads. Hence, giraph.numComputeThreads maximum value is limited to # of
cores. What is the # of workers limited to? (cannot be 1 for my setup as
job runs successfully with 2).


On Sat, Mar 19, 2016 at 6:56 PM, Anirudh Perugu <
anirudh.perugu@stonybrook.edu> wrote:

> Hello All,
>
> I am a giraph newbie, so kindly bear with me. I am trying to run BFS on a
> graph which has : 28048 edges and 786 nodes.
>
> *Here is my Setup :*
> Single Node Cluster, 8GB RAM, 4 Cores, Apache Yarn 2.7.2, Giraph 1.2.0, my
> single machine has everything(yarn+giraph) running on it.
>
> *1. How many workers can I have?*
> I ask this because my giraph job runs fine with settings :* -w 1 -ca
> giraph.SplitMasterWorker=false*
>
> "Testing Results Table"
>
> *no. of workers | maximum no. of containers used | time taken for
> completion*
> 1                          3 (as seen on the UI)                         0
> min 50 secs
> 2                          4
>                                                     1 min 18 secs
> 3                          4
>           Long Running Job. I think it times out after 20 minutes.
>
> for 3 workers, this is the log :
>
>
> *INFO server.PrepRequestProcessor: Got user-level KeeperException when
> processing sessionid:0x153910879aa0000 type:create cxid:0x1 zxid:0x2
> txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0004/_masterElectionDir
> Error:KeeperErrorCode = NoNode for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0004/_masterElectionDir *
> So, this fails due to a reason I still haven't figured out, can you answer
> this ?
>
> Finally, leads me to asking how many workers can I have if I have 4 cores
> on my machine? Are # of cores and # of workers related?
>
>
> *2. How do I use the setting : giraph.yarn.task.heap.mb=x? I set x to 2048
> but my job runs indefinitely(hangs up). Works great with default setting of
> 1024.*
>
>
> *Job takes forever when x=2048, userlogs say :*
> 6/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
> environment:java.library.path=/home//Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/var/folders/s4/n58qlsh97t11vkmhysts8k680000gn/T/
> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client environment:os.name=Mac
> OS X
> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
> environment:os.arch=x86_64
> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
> environment:os.version=10.11.3
> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client environment:user.name
> =Anirudh
> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
> environment:user.home=/Users/Anirudh
> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Client
> environment:user.dir=/private/tmp/hadoop-Anirudh/nm-local-dir/usercache/Anirudh/appcache/application_1458425066569_0001/container_1458425066569_0001_01_000002
> 16/03/19 18:05:36 INFO zookeeper.ZooKeeper: Initiating client connection,
> connectString=172.24.18.199:22181 sessionTimeout=60000
> watcher=org.apache.giraph.master.BspServiceMaster@4097cac
> 16/03/19 18:05:36 INFO zookeeper.ClientCnxn: Opening socket connection to
> server 172.24.18.199/172.24.18.199:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 16/03/19 18:05:36 INFO server.NIOServerCnxnFactory: Accepted socket
> connection from /172.24.18.199:61815
> 16/03/19 18:05:36 INFO zookeeper.ClientCnxn: Socket connection established
> to 172.24.18.199/172.24.18.199:22181, initiating session
> 16/03/19 18:05:36 INFO server.ZooKeeperServer: Client attempting to
> establish new session at /172.24.18.199:61815
> 16/03/19 18:05:36 INFO persistence.FileTxnLog: Creating new log file: log.1
> 16/03/19 18:05:36 INFO server.ZooKeeperServer: Established session
> 0x15390e97a2a0000 with negotiated timeout 600000 for client /
> 172.24.18.199:61815
> 16/03/19 18:05:36 INFO zookeeper.ClientCnxn: Session establishment
> complete on server 172.24.18.199/172.24.18.199:22181, sessionid =
> 0x15390e97a2a0000, negotiated timeout = 600000
> 16/03/19 18:05:36 INFO bsp.BspService: process: Asynchronous connection
> complete.
> 16/03/19 18:05:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
> MASTER_ZOOKEEPER_ONLY starting...
> 16/03/19 18:05:36 INFO graph.GraphTaskManager: map: No need to do anything
> when not a worker
> 16/03/19 18:05:36 INFO graph.GraphTaskManager: cleanup: Starting for
> MASTER_ZOOKEEPER_ONLY
> 16/03/19 18:05:36 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x1 zxid:0x2 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir
> Error:KeeperErrorCode = NoNode for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir
> 16/03/19 18:05:36 INFO master.BspServiceMaster: becomeMaster: First child
> is
> '/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir/172.24.18.199_00000000000'
> and my bid is
> '/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_masterElectionDir/172.24.18.199_00000000000'
> 16/03/19 18:05:36 INFO netty.NettyServer: NettyServer: Using execution
> group with 8 threads for requestFrameDecoder.
> 16/03/19 18:05:36 INFO Configuration.deprecation: mapred.map.tasks is
> deprecated. Instead, use mapreduce.job.maps
> 16/03/19 18:05:36 INFO netty.NettyServer: start: Started server
> communication server: /172.24.18.199:30000 with up to 16 threads on bind
> attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288
> 16/03/19 18:05:36 INFO netty.NettyClient: NettyClient: Using execution
> handler with 8 threads after request-encoder.
> 16/03/19 18:05:36 INFO master.BspServiceMaster: becomeMaster: I am now the
> master!
> 16/03/19 18:05:36 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0xe zxid:0x9 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0
> Error:KeeperErrorCode = NoNode for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0
> 16/03/19 18:05:36 INFO bsp.BspService: process: applicationAttemptChanged
> signaled
> 16/03/19 18:05:36 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x16 zxid:0xc txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1
> Error:KeeperErrorCode = NoNode for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1
> 16/03/19 18:05:36 WARN bsp.BspService: process: Unknown and unprocessed
> event
> (path=/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir,
> type=NodeChildrenChanged, state=SyncConnected)
> 16/03/19 18:05:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
> start superstep -1
> 16/03/19 18:06:06 INFO master.BspServiceMaster: checkWorkers: Only found 0
> responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
> 569971 more msecs left before giving up.
> 16/03/19 18:06:06 INFO master.BspServiceMaster:
> logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
> 16/03/19 18:06:06 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x22 zxid:0x10 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
> Error:KeeperErrorCode = NodeExists for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
> 16/03/19 18:06:06 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x23 zxid:0x11 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
> Error:KeeperErrorCode = NodeExists for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
> 16/03/19 18:06:06 INFO yarn.GiraphYarnTask: [STATUS: task-0]
> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
> start superstep -1
> 16/03/19 18:06:36 INFO master.BspServiceMaster: checkWorkers: Only found 0
> responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
> 539950 more msecs left before giving up.
> 16/03/19 18:06:36 INFO master.BspServiceMaster:
> logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
> 16/03/19 18:06:36 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x26 zxid:0x12 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
> Error:KeeperErrorCode = NodeExists for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
> 16/03/19 18:06:36 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x27 zxid:0x13 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
> Error:KeeperErrorCode = NodeExists for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
> 16/03/19 18:06:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
> start superstep -1
> 16/03/19 18:07:06 INFO master.BspServiceMaster: checkWorkers: Only found 0
> responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
> 509938 more msecs left before giving up.
> 16/03/19 18:07:06 INFO master.BspServiceMaster:
> logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
> 16/03/19 18:07:06 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x2a zxid:0x14 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
> Error:KeeperErrorCode = NodeExists for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
> 16/03/19 18:07:06 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x2b zxid:0x15 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
> Error:KeeperErrorCode = NodeExists for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
> 16/03/19 18:07:06 INFO yarn.GiraphYarnTask: [STATUS: task-0]
> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
> start superstep -1
> 16/03/19 18:07:36 INFO master.BspServiceMaster: checkWorkers: Only found 0
> responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
> 479927 more msecs left before giving up.
> 16/03/19 18:07:36 INFO master.BspServiceMaster:
> logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
> 16/03/19 18:07:36 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x2e zxid:0x16 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
> Error:KeeperErrorCode = NodeExists for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
> 16/03/19 18:07:36 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x2f zxid:0x17 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
> Error:KeeperErrorCode = NodeExists for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
> 16/03/19 18:07:36 INFO yarn.GiraphYarnTask: [STATUS: task-0]
> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
> start superstep -1
> 16/03/19 18:08:06 INFO master.BspServiceMaster: checkWorkers: Only found 0
> responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
> 449916 more msecs left before giving up.
> 16/03/19 18:08:06 INFO master.BspServiceMaster:
> logMissingWorkersOnSuperstep: No response from partition 1 (could be master)
> 16/03/19 18:08:06 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x32 zxid:0x18 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
> Error:KeeperErrorCode = NodeExists for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir
> 16/03/19 18:08:06 INFO server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x15390e97a2a0000 type:create
> cxid:0x33 zxid:0x19 txntype:-1 reqpath:n/a Error
> Path:/_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
> Error:KeeperErrorCode = NodeExists for
> /_hadoopBsp/giraph_yarn_application_1458425066569_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerUnhealthyDir
> 16/03/19 18:08:06 INFO yarn.GiraphYarnTask: [STATUS: task-0]
> MASTER_ZOOKEEPER_ONLY checkWorkers: Only found 0 responses of 1 needed to
> start superstep -1
> 16/03/19 18:08:36 INFO master.BspServiceMaster: checkWorkers: Only found 0
> responses of 1 needed to start superstep -1.  Reporting every 30000 msecs,
> 419894 more msecs left before giving up.
> ----------------------------------------End of
> Logs--------------------------------------------------------
>
> Does this mean that it required 1 worker to start the -1th superstep but
> did not find any or is it something else?
> - If that is the case, I can confirm that I have a node (the only) which
> is healthy.
>
> Finally, how do I give more memory to the giraph job?
>
>
> *3. When I kill a long running job by either killing the application on
> GUI or by pressing control+c, why does my jps still show these :*
> anirudh:hadoop-2.7.2 Anirudh$ jps
>
>
> *9009 GiraphYarnTask8962 GiraphApplicationMaster*9267 Jps
> 8725 NameNode
> 8855 NodeManager
> 8764 DataNode
> 8815 ResourceManager
>
> Aren't they supposed to be killed as well? I ask this because if I run a
> new job at this time, the job is forever in ACCEPTED state. Only after
> those are killed, a fresh job does go to completion (my observation).
>
> Alright then, I am hoping for a reply which will address these issues.
>
> Thanks
> Anirudh
>