You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/06/18 21:12:49 UTC

[GitHub] [hudi] dwshmilyss opened a new issue #3107: [SUPPORT]

dwshmilyss opened a new issue #3107:
URL: https://github.com/apache/hudi/issues/3107


   Use Spark to Hudi and JVM Metaspace OOM
   
   - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? No
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   When I use the Spark API to write a piece of data to Hudi, I notice that the JVM's Metaspace continues to grow until the OOM. 
   ![image](https://user-images.githubusercontent.com/8295288/122516043-d847ca80-d040-11eb-8497-7b24eb55516a.png)
   Here's my code:
   ![image](https://user-images.githubusercontent.com/8295288/122515257-bd288b00-d03f-11eb-9977-bfb198c04f49.png)
   
   Then I let this method loop 1000 times, I found every JVM load of classes named GeneratedSerializationConstructorAccessor, this look like a generated by reflection.
   ![image](https://user-images.githubusercontent.com/8295288/122515494-185a7d80-d040-11eb-9fbc-07379ccaccbd.png)
   this is my start jvm parameters.
   ```
   -Xmn400m
   -Xms2000m
   -Xmx2000m
   -XX:SurvivorRatio=2
   -XX:MetaspaceSize=256m
   -XX:MaxMetaspaceSize=256m
   -XX:+CMSClassUnloadingEnabled
   -XX:SoftRefLRUPolicyMSPerMB=1000
   -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
   -XX:+UseCompressedOops
   -XX:+UseConcMarkSweepGC
   -XX:+UseParNewGC
   -XX:CMSInitiatingOccupancyFraction=70
   -XX:+UseCMSInitiatingOccupancyOnly
   -XX:+UnlockDiagnosticVMOptions
   -XX:+HeapDumpOnOutOfMemoryError
   -Dsun.reflect.inflationThreshold=2147483647
   -XX:+TraceClassLoading
   -XX:+TraceClassUnloading
   -XX:HeapDumpPath=/Users/edz/Desktop/heapDump1.hprof
   ```
   and this is number of  jvm load classes. 
   ![image](https://user-images.githubusercontent.com/8295288/122518129-69b83c00-d043-11eb-9304-8806c51303d3.png)
   GeneratedSerializationConstructorAccessor loaded by sun.reflect.DelegatingClassLoader.These classes remain loaded until Full GC and are unloaded. This results in frequent Full GC.
   Trace the code and I find that these reflections are caused by Spark's transform operator, since an operator like map calls sc.clean() at the beginning. In this method, the following method is called.
   ![image](https://user-images.githubusercontent.com/8295288/122519650-44c4c880-d045-11eb-838e-d5fff8f2048a.png)
   
   Can any one please help us to fix this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3107: [SUPPORT]Use Spark to Hudi and JVM Metaspace OOM

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3107:
URL: https://github.com/apache/hudi/issues/3107#issuecomment-891905297


   awesome find! thanks. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #3107: [SUPPORT]Use Spark to Hudi and JVM Metaspace OOM

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #3107:
URL: https://github.com/apache/hudi/issues/3107


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org