You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by zh...@gmail.com on 2017/04/12 09:28:34 UTC

OOM when using Ignite as HDFS Cache

Hi there,

 

I’d like to use Ignite as HDFS Cache in my cluster but failed with OOM error. Could you help to review my configuration to help avoid it?

 

I’m using DUAL_ASYNC mode. The Ignite nodes can find each other to establish the cluster. There are very few changes in default-config.xml but attached for your review. The JVM heap size is limited to 1GB. The Ignite suffers from OOM exception when I’m running Hadoop benchmark TestDFSIO writing 4*4GB files. I think writing 4GB file to HDFS is in streaming so Ignite should work with it. It’s acceptable to slow down the write performance to wait Ignite write cached data to HDFS but not acceptable to lead crash or data lost.

 

The ignite log is attached as ignite_log.zip, pick some key messages here:

 

17/04/12 00:49:17 INFO [grid-timeout-worker-#19%null%] internal.IgniteKernal: 

Metrics for local node (to disable set 'metricsLogFrequency' to 0)

    ^-- Node [id=9b5dcc35, name=null, uptime=00:26:00:254]

    ^-- H/N/C [hosts=173, nodes=173, CPUs=2276]

    ^-- CPU [cur=0.13%, avg=0.82%, GC=0%]

    ^-- Heap [used=555MB, free=43.3%, comm=979MB]

    ^-- Non heap [used=61MB, free=95.95%, comm=62MB]

    ^-- Public thread pool [active=0, idle=0, qSize=0]

    ^-- System thread pool [active=0, idle=6, qSize=0]

    ^-- Outbound messages queue [size=0]

17/04/12 00:50:06 INFO [disco-event-worker-#35%null%] discovery.GridDiscoveryManager: Added new node to topology: TcpDiscoveryNode [id=553b5c1a-da0b-43cb-b691-b842352b3105, addrs=[0:0:0:0:0:0:0:1, 10.152.133.46, 10.55.68.223, 127.0.0.1, 192.168.1.1], sockAddrs=[BN1APS0A98852E/10.152.133.46:47500, bn1sch010095221.phx.gbl/10.55.68.223:47500, /0:0:0:0:0:0:0:1:47500, /192.168.1.1:47500, /127.0.0.1:47500], discPort=47500, order=176, intOrder=175, lastExchangeTime=1491983403106, loc=false, ver=2.0.0#20170405-sha1:2c830b0d, isClient=false]

[00:50:06] Topology snapshot [ver=176, servers=174, clients=0, CPUs=2288, heap=180.0GB]

...

Exception in thread "igfs-client-worker-2-#585%null%" java.lang.OutOfMemoryError: GC overhead limit exceeded

  at java.util.Arrays.copyOf(Arrays.java:3332)

  at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)

  at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)

  at java.lang.StringBuffer.append(StringBuffer.java:270)

  at java.io.StringWriter.write(StringWriter.java:112)

  at java.io.PrintWriter.write(PrintWriter.java:456)

  at java.io.PrintWriter.write(PrintWriter.java:473)

  at java.io.PrintWriter.print(PrintWriter.java:603)

  at java.io.PrintWriter.println(PrintWriter.java:756)

  at java.lang.Throwable$WrappedPrintWriter.println(Throwable.java:764)

  at java.lang.Throwable.printStackTrace(Throwable.java:658)

  at java.lang.Throwable.printStackTrace(Throwable.java:721)

  at org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:60)

  at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)

  at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)

  at org.apache.log4j.AsyncAppender.append(AsyncAppender.java:162)

  at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)

  at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)

  at org.apache.log4j.Category.callAppenders(Category.java:206)

  at org.apache.log4j.Category.forcedLog(Category.java:391)

  at org.apache.log4j.Category.error(Category.java:322)

  at org.apache.ignite.logger.log4j.Log4JLogger.error(Log4JLogger.java:495)

  at org.apache.ignite.internal.GridLoggerProxy.error(GridLoggerProxy.java:148)

  at org.apache.ignite.internal.util.IgniteUtils.error(IgniteUtils.java:4281)

  at org.apache.ignite.internal.util.IgniteUtils.error(IgniteUtils.java:4306)

  at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:126)

  at java.lang.Thread.run(Thread.java:745)

Exception in thread "LeaseRenewer:hadoop@namenode-vip.yarn3-dev-bn2.bn2.ap.gbl" java.lang.OutOfMemoryError: GC overhead limit exceeded

Exception in thread "igfs-delete-worker%igfs%9b5dcc35-3a4c-4a90-ac9e-89fdd65302a7%" java.lang.OutOfMemoryError: GC overhead limit exceeded

Exception in thread "exchange-worker-#39%null%" java.lang.OutOfMemoryError: GC overhead limit exceeded

…

17/04/12 01:40:10 WARN [disco-event-worker-#35%null%] discovery.GridDiscoveryManager: Stopping local node according to configured segmentation policy.

 

Looking forward to your help.

 

 

Regards,

Shuai Zhang

RE: OOM when using Ignite as HDFS Cache

Posted by dkarachentsev <dk...@gridgain.com>.

Hi,

The correct way here would be to understand where actually problem occurs
and after that make decisions on how to solve it.

> I also notice that Ignite write through file block size is set to 64MB. I
> mean I write a file to Ignite with block size to 4GB, but I finally found
> it on HDFS with block size 64MB. Is there any configuration for it? 

I'm not sure I understand your question, but block size for HDFS is
configured in Hadoop config file.

-Dmitry.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOM-when-using-Ignite-as-HDFS-Cache-tp11900p11974.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: OOM when using Ignite as HDFS Cache

Posted by Ivan Veselovsky <iv...@gridgain.com>.

Hi, zhangshuai.ustc , 
is this problem solved? Can we help more on the subject?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOM-when-using-Ignite-as-HDFS-Cache-tp11900p12297.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: OOM when using Ignite as HDFS Cache

Posted by zh...@gmail.com.

Yes, I'm getting "GC Overhead limit exceeded" OOME and I think this is an unexpected behavior. I'll try the off heap options days later. Thanks for your advice.

I'm providing HDFS server to our customers. As you know, HDFS is not friendly to many small files. We sometimes need to merge files to a large block to reduce metadata size. I think you are correct, the HDFS is facing the same issue comparing to Ignite. I need to configure the same max heap size for both of them.

-----Original Message-----
From: Kamil Misuth [mailto:kimec@ethome.sk] 
Sent: Tuesday, April 18, 2017 3:35 PM
To: user@ignite.apache.org
Subject: Re: OOM when using Ignite as HDFS Cache

Are you getting "GC Overhead limit exceeded" OOME?
I think you could always move IGFS data block cache off heap if it is not the case already.

I am wondering why you've set block size to 4 GB for Ignite when HDFS stock configured block size is either 64 MB or 128 MB. Have you tried to set HDFS block size to 4GB? I am guessing you would get OOME on HDFS data nodes too.

Kamil

Dňa 2017-04-14 08:50 张帅 napísal(a):
> I'm using the latest version of JDK, AKA. 1.8.0_121
> 
> The cache is aim to provide a faster read/write performance. But the 
> availability is more important. 1GB cache is for testing purpose. But 
> it's the same issue if I write a 1TB file to 64GB cache.
> 
> What I mean availability is that Ignite should not exit with OOME.
> Slow down write performance is kind of downgrade. If I write directly 
> to HDFS, I got a write performance of x MB/s. If I write through 
> Ignite, I got a higher performance y MB/s. It is great if y far more 
> larger than x, and also acceptable equal to x sometimes, but not 
> acceptable if HDFS still working but Ignite not working.
> 
> Breaking into small blocks is possible because data coming in a kind 
> of stream. We are always able to pack it whenever we collected 512MB 
> data.
> 
> This issue is not about Cache Eviction Strategy, but about how to 
> avoid OOME & service not available. Cache eviction would not solve it 
> because there do have more data than cache capacity.
> 
> 
> -----Original Message-----
> From: Jörn Franke [mailto:jornfranke@gmail.com]
> Sent: Friday, April 14, 2017 2:36 PM
> To: user@ignite.apache.org
> Subject: Re: OOM when using Ignite as HDFS Cache
> 
> I would not expect any of the things that you mention. A cache is not 
> supposed to slow down writing. This does not make sense from my point 
> of view. Splitting a block into several smaller ones is also not 
> feasible. The data has to go somewhere before splitting.
> 
> I think what you refer to is certain cache eviction strategies.
> 1 GB of cache sounds small for a HDFS cache.
> I suggest to enable the default configuration of ignite on HDFS and 
> then change it step by step to your envisioned configuration.
> 
> That being said, a Hadoop platform with a lot of ecosystem components 
> can be complex, in particular you need to calculate that each of the 
> components (hive, spark etc) has certain memory assigned or has it 
> used when jobs are running. So even if you have configured 1 gb 
> somebody else might have taken it. Less probable but possible is that 
> your JDK has a bug leading to OOME. You may also try to upgrade it.
> 
>> On 14. Apr 2017, at 08:12, <zh...@gmail.com> 
>> <zh...@gmail.com> wrote:
>> 
>> I think it's a kind of misconfiguration. The Ignite document just 
>> mentioned about how to configuration HDFS as a secondary filesystem 
>> but nothing about how to restrict the memory usage to avoid OOME.
>> https://apacheignite.readme.io/v1.0/docs/igfs-secondary-file-system
>> 
>> Assume I configured the max JVM heap size to 1GB.
>> 1. What would happen if I write very fast before Ignite write data to 
>> HDFS asynchronized?
>> 2. What would happen if I want to write a 2GB file block to Ignite?
>> 
>> I expected:
>> 1. Ignite would slow down the write performance to avoid OOME.
>> 2. Ignite would break the 2GB file block into 512MB blocks & write 
>> them to HDFS to avoid OOME.
>> 
>> Do we have configurations against above behaviors? I dig some items 
>> from source code & Ignite Web Console, but seems they are not working 
>> fine.
>> 
>> <property name="fragmentizerConcurrentFiles" value="3"/> <property 
>> name="dualModeMaxPendingPutsSize" value="10"/> <property 
>> name="blockSize" value="536870912"/> <property name="streamBufferSize"
>> value="131072"/> <property name="maxSpaceSize" value="6442450944"/> 
>> <property name="maximumTaskRangeLength" value="536870912"/> <property 
>> name="prefetchBlocks" value="2"/> <property 
>> name="sequentialReadsBeforePrefetch" value="5"/> <property 
>> name="defaultMode" value="DUAL_ASYNC" />
>> 
>> I also notice that Ignite write through file block size is set to 
>> 64MB. I mean I write a file to Ignite with block size to 4GB, but I 
>> finally found it on HDFS with block size 64MB. Is there any 
>> configuration for it?
>> 
>> -----Original Message-----
>> From: dkarachentsev [mailto:dkarachentsev@gridgain.com]
>> Sent: Thursday, April 13, 2017 11:21 PM
>> To: user@ignite.apache.org
>> Subject: Re: OOM when using Ignite as HDFS Cache
>> 
>> Hi Shuai,
>> 
>> Could you please take heap dump on OOME and find what objects consume 
>> memory? There would be a lot of byte[] objects, please find the 
>> nearest GC root for them.
>> 
>> Thanks!
>> 
>> -Dmitry.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-ignite-users.70518.x6.nabble.com/OOM-when-using-Ignite-
>> a s-HDFS-Cache-tp11900p11956.html Sent from the Apache Ignite Users 
>> mailing list archive at Nabble.com.
>>

Re: OOM when using Ignite as HDFS Cache

Posted by Kamil Misuth <ki...@ethome.sk>.

Are you getting "GC Overhead limit exceeded" OOME?
I think you could always move IGFS data block cache off heap if it is 
not the case already.

I am wondering why you've set block size to 4 GB for Ignite when HDFS 
stock configured block size is either 64 MB or 128 MB. Have you tried to 
set HDFS block size to 4GB? I am guessing you would get OOME on HDFS 
data nodes too.

Kamil

D\u0148a 2017-04-14 08:50 \u5f20\u5e05 nap�sal(a):
> I'm using the latest version of JDK, AKA. 1.8.0_121
> 
> The cache is aim to provide a faster read/write performance. But the
> availability is more important. 1GB cache is for testing purpose. But
> it's the same issue if I write a 1TB file to 64GB cache.
> 
> What I mean availability is that Ignite should not exit with OOME.
> Slow down write performance is kind of downgrade. If I write directly
> to HDFS, I got a write performance of x MB/s. If I write through
> Ignite, I got a higher performance y MB/s. It is great if y far more
> larger than x, and also acceptable equal to x sometimes, but not
> acceptable if HDFS still working but Ignite not working.
> 
> Breaking into small blocks is possible because data coming in a kind
> of stream. We are always able to pack it whenever we collected 512MB
> data.
> 
> This issue is not about Cache Eviction Strategy, but about how to
> avoid OOME & service not available. Cache eviction would not solve it
> because there do have more data than cache capacity.
> 
> 
> -----Original Message-----
> From: J�rn Franke [mailto:jornfranke@gmail.com]
> Sent: Friday, April 14, 2017 2:36 PM
> To: user@ignite.apache.org
> Subject: Re: OOM when using Ignite as HDFS Cache
> 
> I would not expect any of the things that you mention. A cache is not
> supposed to slow down writing. This does not make sense from my point
> of view. Splitting a block into several smaller ones is also not
> feasible. The data has to go somewhere before splitting.
> 
> I think what you refer to is certain cache eviction strategies.
> 1 GB of cache sounds small for a HDFS cache.
> I suggest to enable the default configuration of ignite on HDFS and
> then change it step by step to your envisioned configuration.
> 
> That being said, a Hadoop platform with a lot of ecosystem components
> can be complex, in particular you need to calculate that each of the
> components (hive, spark etc) has certain memory assigned or has it
> used when jobs are running. So even if you have configured 1 gb
> somebody else might have taken it. Less probable but possible is that
> your JDK has a bug leading to OOME. You may also try to upgrade it.
> 
>> On 14. Apr 2017, at 08:12, <zh...@gmail.com> 
>> <zh...@gmail.com> wrote:
>> 
>> I think it's a kind of misconfiguration. The Ignite document just
>> mentioned about how to configuration HDFS as a secondary filesystem
>> but nothing about how to restrict the memory usage to avoid OOME.
>> https://apacheignite.readme.io/v1.0/docs/igfs-secondary-file-system
>> 
>> Assume I configured the max JVM heap size to 1GB.
>> 1. What would happen if I write very fast before Ignite write data to 
>> HDFS asynchronized?
>> 2. What would happen if I want to write a 2GB file block to Ignite?
>> 
>> I expected:
>> 1. Ignite would slow down the write performance to avoid OOME.
>> 2. Ignite would break the 2GB file block into 512MB blocks & write 
>> them to HDFS to avoid OOME.
>> 
>> Do we have configurations against above behaviors? I dig some items 
>> from source code & Ignite Web Console, but seems they are not working 
>> fine.
>> 
>> <property name="fragmentizerConcurrentFiles" value="3"/> <property
>> name="dualModeMaxPendingPutsSize" value="10"/> <property
>> name="blockSize" value="536870912"/> <property name="streamBufferSize"
>> value="131072"/> <property name="maxSpaceSize" value="6442450944"/>
>> <property name="maximumTaskRangeLength" value="536870912"/> <property
>> name="prefetchBlocks" value="2"/> <property
>> name="sequentialReadsBeforePrefetch" value="5"/> <property
>> name="defaultMode" value="DUAL_ASYNC" />
>> 
>> I also notice that Ignite write through file block size is set to 
>> 64MB. I mean I write a file to Ignite with block size to 4GB, but I 
>> finally found it on HDFS with block size 64MB. Is there any 
>> configuration for it?
>> 
>> -----Original Message-----
>> From: dkarachentsev [mailto:dkarachentsev@gridgain.com]
>> Sent: Thursday, April 13, 2017 11:21 PM
>> To: user@ignite.apache.org
>> Subject: Re: OOM when using Ignite as HDFS Cache
>> 
>> Hi Shuai,
>> 
>> Could you please take heap dump on OOME and find what objects consume 
>> memory? There would be a lot of byte[] objects, please find the 
>> nearest GC root for them.
>> 
>> Thanks!
>> 
>> -Dmitry.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-ignite-users.70518.x6.nabble.com/OOM-when-using-Ignite-a
>> s-HDFS-Cache-tp11900p11956.html Sent from the Apache Ignite Users
>> mailing list archive at Nabble.com.
>>

RE: OOM when using Ignite as HDFS Cache

Posted by 张帅 <sa...@gmail.com>.

I'm using the latest version of JDK, AKA. 1.8.0_121

The cache is aim to provide a faster read/write performance. But the availability is more important. 1GB cache is for testing purpose. But it's the same issue if I write a 1TB file to 64GB cache.

What I mean availability is that Ignite should not exit with OOME. Slow down write performance is kind of downgrade. If I write directly to HDFS, I got a write performance of x MB/s. If I write through Ignite, I got a higher performance y MB/s. It is great if y far more larger than x, and also acceptable equal to x sometimes, but not acceptable if HDFS still working but Ignite not working.

Breaking into small blocks is possible because data coming in a kind of stream. We are always able to pack it whenever we collected 512MB data.

This issue is not about Cache Eviction Strategy, but about how to avoid OOME & service not available. Cache eviction would not solve it because there do have more data than cache capacity.

-----Original Message-----
From: Jörn Franke [mailto:jornfranke@gmail.com] 
Sent: Friday, April 14, 2017 2:36 PM
To: user@ignite.apache.org
Subject: Re: OOM when using Ignite as HDFS Cache

I would not expect any of the things that you mention. A cache is not supposed to slow down writing. This does not make sense from my point of view. Splitting a block into several smaller ones is also not feasible. The data has to go somewhere before splitting. 

I think what you refer to is certain cache eviction strategies.
1 GB of cache sounds small for a HDFS cache.
I suggest to enable the default configuration of ignite on HDFS and then change it step by step to your envisioned configuration.

That being said, a Hadoop platform with a lot of ecosystem components can be complex, in particular you need to calculate that each of the components (hive, spark etc) has certain memory assigned or has it used when jobs are running. So even if you have configured 1 gb somebody else might have taken it. Less probable but possible is that your JDK has a bug leading to OOME. You may also try to upgrade it.

> On 14. Apr 2017, at 08:12, <zh...@gmail.com> <zh...@gmail.com> wrote:
> 
> I think it's a kind of misconfiguration. The Ignite document just 
> mentioned about how to configuration HDFS as a secondary filesystem 
> but nothing about how to restrict the memory usage to avoid OOME. 
> https://apacheignite.readme.io/v1.0/docs/igfs-secondary-file-system
> 
> Assume I configured the max JVM heap size to 1GB.
> 1. What would happen if I write very fast before Ignite write data to HDFS asynchronized?
> 2. What would happen if I want to write a 2GB file block to Ignite?
> 
> I expected:
> 1. Ignite would slow down the write performance to avoid OOME.
> 2. Ignite would break the 2GB file block into 512MB blocks & write them to HDFS to avoid OOME.
> 
> Do we have configurations against above behaviors? I dig some items from source code & Ignite Web Console, but seems they are not working fine. 
> 
> <property name="fragmentizerConcurrentFiles" value="3"/> <property 
> name="dualModeMaxPendingPutsSize" value="10"/> <property 
> name="blockSize" value="536870912"/> <property name="streamBufferSize" 
> value="131072"/> <property name="maxSpaceSize" value="6442450944"/> 
> <property name="maximumTaskRangeLength" value="536870912"/> <property 
> name="prefetchBlocks" value="2"/> <property 
> name="sequentialReadsBeforePrefetch" value="5"/> <property 
> name="defaultMode" value="DUAL_ASYNC" />
> 
> I also notice that Ignite write through file block size is set to 64MB. I mean I write a file to Ignite with block size to 4GB, but I finally found it on HDFS with block size 64MB. Is there any configuration for it?
> 
> -----Original Message-----
> From: dkarachentsev [mailto:dkarachentsev@gridgain.com]
> Sent: Thursday, April 13, 2017 11:21 PM
> To: user@ignite.apache.org
> Subject: Re: OOM when using Ignite as HDFS Cache
> 
> Hi Shuai,
> 
> Could you please take heap dump on OOME and find what objects consume memory? There would be a lot of byte[] objects, please find the nearest GC root for them.
> 
> Thanks!
> 
> -Dmitry.
> 
> 
> 
> --
> View this message in context: 
> http://apache-ignite-users.70518.x6.nabble.com/OOM-when-using-Ignite-a
> s-HDFS-Cache-tp11900p11956.html Sent from the Apache Ignite Users 
> mailing list archive at Nabble.com.
>

Re: OOM when using Ignite as HDFS Cache

Posted by Jörn Franke <jo...@gmail.com>.

I would not expect any of the things that you mention. A cache is not supposed to slow down writing. This does not make sense from my point of view. Splitting a block into several smaller ones is also not feasible. The data has to go somewhere before splitting. 

I think what you refer to is certain cache eviction strategies.
1 GB of cache sounds small for a HDFS cache.
I suggest to enable the default configuration of ignite on HDFS and then change it step by step to your envisioned configuration.

That being said, a Hadoop platform with a lot of ecosystem components can be complex, in particular you need to calculate that each of the components (hive, spark etc) has certain memory assigned or has it used when jobs are running. So even if you have configured 1 gb somebody else might have taken it. Less probable but possible is that your JDK has a bug leading to OOME. You may also try to upgrade it.

> On 14. Apr 2017, at 08:12, <zh...@gmail.com> <zh...@gmail.com> wrote:
> 
> I think it's a kind of misconfiguration. The Ignite document just mentioned about how to configuration HDFS as a secondary filesystem but nothing about how to restrict the memory usage to avoid OOME. https://apacheignite.readme.io/v1.0/docs/igfs-secondary-file-system
> 
> Assume I configured the max JVM heap size to 1GB.
> 1. What would happen if I write very fast before Ignite write data to HDFS asynchronized?
> 2. What would happen if I want to write a 2GB file block to Ignite?
> 
> I expected:
> 1. Ignite would slow down the write performance to avoid OOME.
> 2. Ignite would break the 2GB file block into 512MB blocks & write them to HDFS to avoid OOME.
> 
> Do we have configurations against above behaviors? I dig some items from source code & Ignite Web Console, but seems they are not working fine. 
> 
> <property name="fragmentizerConcurrentFiles" value="3"/>
> <property name="dualModeMaxPendingPutsSize" value="10"/>
> <property name="blockSize" value="536870912"/>
> <property name="streamBufferSize" value="131072"/>
> <property name="maxSpaceSize" value="6442450944"/>
> <property name="maximumTaskRangeLength" value="536870912"/>
> <property name="prefetchBlocks" value="2"/>
> <property name="sequentialReadsBeforePrefetch" value="5"/>
> <property name="defaultMode" value="DUAL_ASYNC" />
> 
> I also notice that Ignite write through file block size is set to 64MB. I mean I write a file to Ignite with block size to 4GB, but I finally found it on HDFS with block size 64MB. Is there any configuration for it?
> 
> -----Original Message-----
> From: dkarachentsev [mailto:dkarachentsev@gridgain.com] 
> Sent: Thursday, April 13, 2017 11:21 PM
> To: user@ignite.apache.org
> Subject: Re: OOM when using Ignite as HDFS Cache
> 
> Hi Shuai,
> 
> Could you please take heap dump on OOME and find what objects consume memory? There would be a lot of byte[] objects, please find the nearest GC root for them.
> 
> Thanks!
> 
> -Dmitry.
> 
> 
> 
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOM-when-using-Ignite-as-HDFS-Cache-tp11900p11956.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>

RE: OOM when using Ignite as HDFS Cache

Posted by zh...@gmail.com.

I think it's a kind of misconfiguration. The Ignite document just mentioned about how to configuration HDFS as a secondary filesystem but nothing about how to restrict the memory usage to avoid OOME. https://apacheignite.readme.io/v1.0/docs/igfs-secondary-file-system

Assume I configured the max JVM heap size to 1GB.
1. What would happen if I write very fast before Ignite write data to HDFS asynchronized?
2. What would happen if I want to write a 2GB file block to Ignite?

I expected:
1. Ignite would slow down the write performance to avoid OOME.
2. Ignite would break the 2GB file block into 512MB blocks & write them to HDFS to avoid OOME.

Do we have configurations against above behaviors? I dig some items from source code & Ignite Web Console, but seems they are not working fine. 

<property name="fragmentizerConcurrentFiles" value="3"/>
<property name="dualModeMaxPendingPutsSize" value="10"/>
<property name="blockSize" value="536870912"/>
<property name="streamBufferSize" value="131072"/>
<property name="maxSpaceSize" value="6442450944"/>
<property name="maximumTaskRangeLength" value="536870912"/>
<property name="prefetchBlocks" value="2"/>
<property name="sequentialReadsBeforePrefetch" value="5"/>
<property name="defaultMode" value="DUAL_ASYNC" />

I also notice that Ignite write through file block size is set to 64MB. I mean I write a file to Ignite with block size to 4GB, but I finally found it on HDFS with block size 64MB. Is there any configuration for it?

-----Original Message-----
From: dkarachentsev [mailto:dkarachentsev@gridgain.com] 
Sent: Thursday, April 13, 2017 11:21 PM
To: user@ignite.apache.org
Subject: Re: OOM when using Ignite as HDFS Cache

Hi Shuai,

Could you please take heap dump on OOME and find what objects consume memory? There would be a lot of byte[] objects, please find the nearest GC root for them.

Thanks!

-Dmitry.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOM-when-using-Ignite-as-HDFS-Cache-tp11900p11956.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: OOM when using Ignite as HDFS Cache

Posted by dkarachentsev <dk...@gridgain.com>.

Hi Shuai,

Could you please take heap dump on OOME and find what objects consume
memory? There would be a lot of byte[] objects, please find the nearest GC
root for them.

Thanks!

-Dmitry.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOM-when-using-Ignite-as-HDFS-Cache-tp11900p11956.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: OOM when using Ignite as HDFS Cache

Posted by 张帅 <sa...@gmail.com>.

Ping…

 

From: 张帅 [mailto:satan.student@gmail.com] On Behalf Of zhangshuai.ustc@gmail.com
Sent: Wednesday, April 12, 2017 5:29 PM
To: user@ignite.apache.org
Subject: OOM when using Ignite as HDFS Cache

 

Hi there,

 

I’d like to use Ignite as HDFS Cache in my cluster but failed with OOM error. Could you help to review my configuration to help avoid it?

 

I’m using DUAL_ASYNC mode. The Ignite nodes can find each other to establish the cluster. There are very few changes in default-config.xml but attached for your review. The JVM heap size is limited to 1GB. The Ignite suffers from OOM exception when I’m running Hadoop benchmark TestDFSIO writing 4*4GB files. I think writing 4GB file to HDFS is in streaming so Ignite should work with it. It’s acceptable to slow down the write performance to wait Ignite write cached data to HDFS but not acceptable to lead crash or data lost.

 

The ignite log is attached as ignite_log.zip, pick some key messages here:

 

17/04/12 00:49:17 INFO [grid-timeout-worker-#19%null%] internal.IgniteKernal: 

Metrics for local node (to disable set 'metricsLogFrequency' to 0)

    ^-- Node [id=9b5dcc35, name=null, uptime=00:26:00:254]

    ^-- H/N/C [hosts=173, nodes=173, CPUs=2276]

    ^-- CPU [cur=0.13%, avg=0.82%, GC=0%]

    ^-- Heap [used=555MB, free=43.3%, comm=979MB]

    ^-- Non heap [used=61MB, free=95.95%, comm=62MB]

    ^-- Public thread pool [active=0, idle=0, qSize=0]

    ^-- System thread pool [active=0, idle=6, qSize=0]

    ^-- Outbound messages queue [size=0]

17/04/12 00:50:06 INFO [disco-event-worker-#35%null%] discovery.GridDiscoveryManager: Added new node to topology: TcpDiscoveryNode [id=553b5c1a-da0b-43cb-b691-b842352b3105, addrs=[0:0:0:0:0:0:0:1, 10.152.133.46, 10.55.68.223, 127.0.0.1, 192.168.1.1], sockAddrs=[BN1APS0A98852E/10.152.133.46:47500, bn1sch010095221.phx.gbl/10.55.68.223:47500, /0:0:0:0:0:0:0:1:47500, /192.168.1.1:47500, /127.0.0.1:47500], discPort=47500, order=176, intOrder=175, lastExchangeTime=1491983403106, loc=false, ver=2.0.0#20170405-sha1:2c830b0d, isClient=false]

[00:50:06] Topology snapshot [ver=176, servers=174, clients=0, CPUs=2288, heap=180.0GB]

...

Exception in thread "igfs-client-worker-2-#585%null%" java.lang.OutOfMemoryError: GC overhead limit exceeded

  at java.util.Arrays.copyOf(Arrays.java:3332)

  at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)

  at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)

  at java.lang.StringBuffer.append(StringBuffer.java:270)

  at java.io.StringWriter.write(StringWriter.java:112)

  at java.io.PrintWriter.write(PrintWriter.java:456)

  at java.io.PrintWriter.write(PrintWriter.java:473)

  at java.io.PrintWriter.print(PrintWriter.java:603)

  at java.io.PrintWriter.println(PrintWriter.java:756)

  at java.lang.Throwable$WrappedPrintWriter.println(Throwable.java:764)

  at java.lang.Throwable.printStackTrace(Throwable.java:658)

  at java.lang.Throwable.printStackTrace(Throwable.java:721)

  at org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:60)

  at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)

  at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)

  at org.apache.log4j.AsyncAppender.append(AsyncAppender.java:162)

  at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)

  at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)

  at org.apache.log4j.Category.callAppenders(Category.java:206)

  at org.apache.log4j.Category.forcedLog(Category.java:391)

  at org.apache.log4j.Category.error(Category.java:322)

  at org.apache.ignite.logger.log4j.Log4JLogger.error(Log4JLogger.java:495)

  at org.apache.ignite.internal.GridLoggerProxy.error(GridLoggerProxy.java:148)

  at org.apache.ignite.internal.util.IgniteUtils.error(IgniteUtils.java:4281)

  at org.apache.ignite.internal.util.IgniteUtils.error(IgniteUtils.java:4306)

  at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:126)

  at java.lang.Thread.run(Thread.java:745)

Exception in thread "LeaseRenewer:hadoop@namenode-vip.yarn3-dev-bn2.bn2.ap.gbl" java.lang.OutOfMemoryError: GC overhead limit exceeded

Exception in thread "igfs-delete-worker%igfs%9b5dcc35-3a4c-4a90-ac9e-89fdd65302a7%" java.lang.OutOfMemoryError: GC overhead limit exceeded

Exception in thread "exchange-worker-#39%null%" java.lang.OutOfMemoryError: GC overhead limit exceeded

…

17/04/12 01:40:10 WARN [disco-event-worker-#35%null%] discovery.GridDiscoveryManager: Stopping local node according to configured segmentation policy.

 

Looking forward to your help.

 

 

Regards,

Shuai Zhang