You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by amir bahmanyari <am...@yahoo.com> on 2016/10/31 17:53:05 UTC

Beam App: MEM<-->Disk Space Issue

Hi Colleagues,Perhaps some of these questions should be asked in Flink forum. Pls let me know if thats the case so I can post it there  as well.
I am facing something new when running my Beam app in a 4 nodes Flink Cluster.I list the behavior items:1- Dashboard shows all nodes actively running2- All slots being consumed3- There is a Flink instance daemon running in every node4- I submit the app fat jar from one of the servers to the cluster where JM is running successfully5- All Task managers log progress reports6- Only one of the nodes, on random basis, reports *.out logs (logs commuted by the app, not Flink). Other nodes donr report any *.out. Zero size.7- The app runs under heavy load for some reasonable time before start crunching...i.e. slowing down...which is expected7- The node where the *.out gets reported, runs out of space and "/"  is 100%. Other nodes stay at "/" being 10%.8- I get the following exception in the *.out of the only node that reports *.out.
Questions:1- Why other nodes dont report *.out at runtime & only random node reports it?2- What should I do in my app to avoid this exception?3- What can I configure in Flink and/or in my environment and/or the servers config to avoid this issue?
I really appreciate your valuable time & help.It is very crucial to pass this issue in our bench-marking efforts and product selection process.
Cheers+here is the exception:Amir-
log4j:ERROR Failed to flush writer,java.io.IOException: No space left on device        at java.io.FileOutputStream.writeBytes(Native Method)        at java.io.FileOutputStream.write(FileOutputStream.java:326)        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)        at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)        at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)        at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)        at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)        at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)        at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)        at org.apache.log4j.Category.callAppenders(Category.java:206)        at org.apache.log4j.Category.forcedLog(Category.java:391)        at org.apache.log4j.Category.log(Category.java:856)        at org.slf4j.impl.Log4jLoggerAdapter.warn(Log4jLoggerAdapter.java:420)        at org.apache.beam.sdk.io.kafka.KafkaIO$UnboundedKafkaReader.getWatermark(KafkaIO.java:1078)        at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.trigger(UnboundedSourceWrapper.java:366)        at org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:710)        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)        at java.util.concurrent.FutureTask.run(FutureTask.java:266)        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)        at java.lang.Thread.run(Thread.java:745)[abahman@beam4 log]$