You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2020/08/20 09:39:00 UTC

[jira] [Commented] (HADOOP-17209) Erasure Coding: Native library memory leak

    [ https://issues.apache.org/jira/browse/HADOOP-17209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181071#comment-17181071 ] 

Stephen O'Donnell commented on HADOOP-17209:
--------------------------------------------

From the tutorial posted here:

http://www.iitk.ac.in/esc101/05Aug/tutorial/native1.1/implementing/array.html

It does indeed seem that you must call ReleaseIntArrayElements each time you call GetIntArrayElements, so the change makes sense to me. However, I have never used JNI, so my knowledge in this area is very small.

Grepping the code for GetIntArrayElements, I see there are 3 occurrences of this currently:

{code}
$ pwd
/Users/sodonnell/source/upstream_hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/erasurecode
$ grep GetIntArrayElements *.c
jni_common.c:  tmpInputOffsets = (int*)(*env)->GetIntArrayElements(env,
jni_common.c:  tmpOutputOffsets = (int*)(*env)->GetIntArrayElements(env,
jni_rs_decoder.c:  int* tmpErasedIndexes = (int*)(*env)->GetIntArrayElements(env,
{code}

This patch address 2 of them - in jni_common.c in the function getOutputs, do we need a call to ReleaseIntArrayElements there too?

{code}
void getOutputs(JNIEnv *env, jobjectArray outputs, jintArray outputOffsets,
                              unsigned char** destOutputs, int num) {
  int numOutputs = (*env)->GetArrayLength(env, outputs);
  int i, *tmpOutputOffsets;
  jobject byteBuffer;

  if (numOutputs != num) {
    THROW(env, "java/lang/InternalError", "Invalid outputs");
  }

  tmpOutputOffsets = (int*)(*env)->GetIntArrayElements(env,
                                                          outputOffsets, NULL);
  for (i = 0; i < numOutputs; i++) {
    byteBuffer = (*env)->GetObjectArrayElement(env, outputs, i);
    destOutputs[i] = (unsigned char *)((*env)->GetDirectBufferAddress(env,
                                                                  byteBuffer));
    destOutputs[i] += tmpOutputOffsets[i];
  }
}
{code}

[~seanlook] Have you been running with this patch in production for some time, and all EC operations are working fine with it?

> Erasure Coding: Native library memory leak
> ------------------------------------------
>
>                 Key: HADOOP-17209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17209
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 3.3.0, 3.2.1, 3.1.3
>            Reporter: Sean Chow
>            Assignee: Sean Chow
>            Priority: Major
>         Attachments: HADOOP-17209.001.patch, datanode.202137.detail_diff.5.txt, image-2020-08-15-18-26-44-744.png, image-2020-08-20-12-35-39-906.png
>
>
> We use both {{apache-hadoop-3.1.3}} and {{CDH-6.1.1-1.cdh6.1.1.p0.875250}} HDFS in production, and both of them have the memory increasing over {{-Xmx}} value. 
> !image-2020-08-15-18-26-44-744.png!
>  
> We use EC strategy to to save storage costs.
> This's the jvm options:
> {code:java}
> -Dproc_datanode -Dhdfs.audit.logger=INFO,RFAAUDIT -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true -Xms8589934592 -Xmx8589934592 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError ...{code}
> The max jvm heapsize is 8GB, but we can see the datanode RSS memory is 48g. All the other datanodes in this hdfs cluster has the same issue.
> {code:java}
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
> 226044 hdfs 20 0 50.6g 48g 4780 S 90.5 77.0 14728:27 /usr/java/jdk1.8.0_162/bin/java -Dproc_datanode{code}
>  
> This too much memory used leads to my machine unresponsive(if enable swap), or oom-killer happens.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org