You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2021/04/07 10:35:36 UTC

[GitHub] [skywalking] nisiyong opened a new issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

nisiyong opened a new issue #6703:
URL: https://github.com/apache/skywalking/issues/6703


   Please answer these questions before submitting your issue.
   
   - Why do you submit this issue?
     - [ ] Question or discussion
     - [ ] Bug
     - [ ] Requirement
     - [x] Feature or performance improvement
   
   ___
   ### Requirement or improvement
   
   SkyWalking Java Agent is a powerful language instrument, it makes us build our tracing system more easily.
   
   We have used SkyWalking with our Java Applications in production serval mouths, it runs fine mostly. Recently, we found some applications occur with frequent GC and some occur OOM. We dump the memory heap and use [Memory Analyzer (MAT)](https://www.eclipse.org/mat/) find there has a lot of `TraceSegmentRef` Object in the heap. Here are two cases as follows:
   
   #### Case 1: Frequency GC
   
   In this case, the app has 1000 Dubbo handler threads, each handler will do a lot RPCs and DB operations.
   - JVM Max Heap:  8g
   - Machine: 8 core 16g
   - SkyWalking Agent: 8.4.0, collect all traces
   
   ![image](https://user-images.githubusercontent.com/8198862/113844121-bfac7000-97c6-11eb-8db2-580863c87cca.png)
   
   ![image](https://user-images.githubusercontent.com/8198862/113843193-f5048e00-97c5-11eb-8cd5-75a1ad543cc7.png)
   
   
   #### Case 2: OOM
   
   In this case, the app has 20 RocketMQ consume threads, in the consume thread, it will do some RPCs and DB operations.
   - JVM Max Heap:  8g
   - Machine: 8 core 16g
   - SkyWalking Agent: 8.4.0, collect all traces
   
   ![image](https://user-images.githubusercontent.com/8198862/113843899-912e9500-97c6-11eb-93db-4a955b9129fa.png)
   
   ![image](https://user-images.githubusercontent.com/8198862/113843489-31d08500-97c6-11eb-9492-9bfa28284c81.png)
   
   ---
   
   On the application side, I think there have 3 reasons:
   1. sudden high throughput will cause all threads busy to handle requests.
   2. each request handle has a lot of RPCs and DB operations, cause create a lot of spans
   3. Handle requests slowly, some will elapse 10s even more.
   
   On the agent side, I have read the source code and know some design:
   - The `Segment` in the SkyWalking concept, is the Object in the RingBuffer on the client-side, and SkyWalking has a consume thread consume the RingBuffer data send to the OAP.
   - Before put the `Segment` Object in the RingBuffer, will build it first. Each request will create some spans, and there are put in the stack data structure, the `Segment` will finish building utils the stack empty, which means the request in the application has finished. It will take some time. Meanwhile, the data will keep in the thread-local. And the garbage collector cannot collect them before the request finished.
   
   I wonder why put the segment in the ring buffer, could we put the span? I don't familiar with the Segment design purpose.
   And I know we should improve our application at the same time, but in some scenarios, people can tolerate it, even though handling requests slowly. So how SkyWalking Java Agent can do in such extreme scenarios? Because the application availability is very important, all of us won't hope the APM instrument occupies a lot of memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] zifeihan commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
zifeihan commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815556702


   I think the reason for this is that an exception occurred in a plug-in, which caused threadlocal to not be cleaned up, resulting in `org.apache.skywalking.apm.agent.core.context.trace.TraceSegmentRef` constantly being added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815818472


   Take a easy. Do you mean I should open a new issue to ask another question?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] WildWolfBang edited a comment on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
WildWolfBang edited a comment on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815501380


   > I want to send a warning to your analysis, this graph shows, you used 409M memory for 8m SegmentRef instances. 
   
   ![image](https://user-images.githubusercontent.com/17874410/113977551-559ed400-9875-11eb-927f-d0bb266db0fb.png)
   ![image](https://user-images.githubusercontent.com/17874410/113976498-d0ff8600-9873-11eb-8ec1-1296fd987699.png)
   
     @wu-sheng Hi,I have noticed that "dominat_tree graph" shows `private List<TraceSegmentRef> refs` contains about 511360 entries in one TheadLocal, so I expand the "Class Name" until "java.lang.string" or "char[]" to confirm real memory,then it occurs "Shallow Heap" equals "Retained Heap". The list actually used about 356M in one TheadLocal.The situation is the same as other Dubbo threads that are sorted by percentage.
   The docs said `private List<TraceSegmentRef> refs` used to link multi parents trace segments, the segment faces hundreds of thousands of parents is unusual unless a loop occurs.
   
   ```
   public class TraceSegment {
       private String traceSegmentId;
       /**
        * The refs of parent trace segments, except the primary one. For most RPC call, {@link #refs} contains only one
        * element, but if this segment is a start span of batch process, the segment faces multi parents, at this moment,
        * we use this {@code #refs} to link them.
        * <p>
        * This field will not be serialized. Keeping this field is only for quick accessing.
        */
       private List<TraceSegmentRef> refs;
   ...
   }
   ```
   
   https://dzone.com/articles/eclipse-mat-shallow-heap-retained-heap
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-819193743


   @nisiyong This issue is going to be closed once #6715 gets merged, it has 2 approvals already. 
   
   To other people reading this issue, that PR is a precautionary measure, rather than a real bug fix or resolve this particular issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815813961


   > @wu-sheng
   > I found that there is no source code read `TraceSegmentRef`, only the test code read it.
   > The agent adds `TraceSegmentRef` in the JVM from each request, do we still need `TraceSegmentRef` now? Only for test code?
   > 
   > ![image](https://user-images.githubusercontent.com/8198862/114031905-66b80700-98ae-11eb-8389-7ab603458378.png)
   
   Why we are beginning to discuss source code here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-814973877


   > But the heap dump shows that most cache data are the SkyWalking Segments.
   
   Don't think in this way. Yes, there are so many objects, but do you think what is expected?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-814963138


   > Also, your codes actually cache much more data than we did for tracing the context.
   
   But the heap dump shows that most cache data are the SkyWalking Segments.
   
   > Unless, like in your case, I would say you must put the parameter collection by yourself.
   
   I do not understand what you mean. Could you tell me more information about that?
   
   > Unless you could confirm, you have so many RPCs or MQ messages consuming at this point, then you are facing plugin bug and memory leak. Rather than any design or code perspective issues.
   
   Thank you for your point, I will do more analysis and check the plugins.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] WildWolfBang commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
WildWolfBang commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815501380


   > I want to send a warning to your analysis, this graph shows, you used 409M memory for 8m SegmentRef instances. 
   
   ![image](https://user-images.githubusercontent.com/17874410/113977551-559ed400-9875-11eb-927f-d0bb266db0fb.png)
   ![image](https://user-images.githubusercontent.com/17874410/113976498-d0ff8600-9873-11eb-8ec1-1296fd987699.png)
   
     @wu-sheng Hi,I have noticed that "dominat_tree graph" shows "private List<TraceSegmentRef> refs" contains about 511360 enties in one TheadLocal, so I expand the "Class Name" until "java.lang.string" or "char[]" to confirm real memory,then it occurs "Shallow Heap" equals "Retained Heap". The list actually used about 356m in one TheadLocal.The situation is same to other dubbo theads that sorted by percentage.
   The docs  said `private List<TraceSegmentRef> refs` used to link multi parents trace segments, the segment faces hundreds of thousands parents is unusually unless an loop occurs.
   
   `
   public class TraceSegment {
       private String traceSegmentId;
       /**
        * The refs of parent trace segments, except the primary one. For most RPC call, {@link #refs} contains only one
        * element, but if this segment is a start span of batch process, the segment faces multi parents, at this moment,
        * we use this {@code #refs} to link them.
        * <p>
        * This field will not be serialized. Keeping this field is only for quick accessing.
        */
       private List<TraceSegmentRef> refs;
   ...
   }
   
   `
   
   https://dzone.com/articles/eclipse-mat-shallow-heap-retained-heap
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-814936834


   > Maybe you can use `agent.sample_n_per_3_secs` to limit.
   
   Thank you, I have already set it, and also set the `agent.span_limit_per_segment` less than 300.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-818319518


   Thanks for the update.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] liqiangz commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
liqiangz commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-814841599


   Maybe you can use `agent.sample_n_per_3_secs`  to  limit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] WildWolfBang edited a comment on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
WildWolfBang edited a comment on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815501380


   > I want to send a warning to your analysis, this graph shows, you used 409M memory for 8m SegmentRef instances. 
   
   ![image](https://user-images.githubusercontent.com/17874410/113977551-559ed400-9875-11eb-927f-d0bb266db0fb.png)
   ![image](https://user-images.githubusercontent.com/17874410/113976498-d0ff8600-9873-11eb-8ec1-1296fd987699.png)
   
     @wu-sheng Hi,I have noticed that "dominat_tree graph" shows `private List<TraceSegmentRef> refs` contains about 511360 entries in one TheadLocal, so I expand the "Class Name" until "java.lang.string" or "char[]" to confirm real memory,then it occurs "Shallow Heap" equals "Retained Heap". The list actually used about 356M in one TheadLocal.The situation is the same as other Dubbo threads that are sorted by percentage.
   The docs said `private List<TraceSegmentRef> refs` used to link multi parents trace segments, the segment faces hundreds of thousands of parents is unusual unless a loop occurs.
   
   `
   public class TraceSegment {
       private String traceSegmentId;
       /**
        * The refs of parent trace segments, except the primary one. For most RPC call, {@link #refs} contains only one
        * element, but if this segment is a start span of batch process, the segment faces multi parents, at this moment,
        * we use this {@code #refs} to link them.
        * <p>
        * This field will not be serialized. Keeping this field is only for quick accessing.
        */
       private List<TraceSegmentRef> refs;
   ...
   }
   
   `
   
   https://dzone.com/articles/eclipse-mat-shallow-heap-retained-heap
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-819242375


   > @nisiyong This issue is going to be closed once #6715 gets merged, it has 2 approvals already.
   
   It is OK, if there have another problem I will open a new issue and relate this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] zifeihan commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
zifeihan commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815597774


   > Thanks, I will take a look. But you could see the above 2 cases: the 1st one this about Dubbo threads, the 2nd one is about RocketMQ consume threads. I do not believe both plugins have the same question. Anyway, I will do more analysis.
   
   Thank you, I will try to analyze and solve the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815821570


   Okay, get it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815593089


   > I think the reason for this is that an exception occurred in a plugin, which caused threadLocal to not be cleaned up, resulting in `org.apache.skywalking.apm.agent.core.context.trace.TraceSegmentRef` constantly being added.
   
   Thanks, I will take a look. But you could see the above 2 cases:  the 1st one this about Dubbo threads, the 2nd one is about RocketMQ consume threads. I do not believe both plugins have the same question. Anyway, I will do more analysis.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815820632


   > Take a easy. Do you mean I should open a new issue to ask another question?
   
   Yes. If you want to dig deeper purely about codes, such as optimize/polish codes for more clear, more efficiency, let's do it on codes only. Unless you find out, this code relates to this memory leak. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng closed issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
wu-sheng closed issue #6703:
URL: https://github.com/apache/skywalking/issues/6703


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-814973259


   > I do not understand what you mean. Could you tell me more information about that?
   
   Collect HTTP parameters, SQL parameters, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong edited a comment on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong edited a comment on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815808429


   @wu-sheng 
   I found that there is no source code read `TraceSegmentRef`, only the test code read it.
   The agent adds `TraceSegmentRef` in the JVM from each request, do we still need `TraceSegmentRef` now? Only for test code?
   
   ![image](https://user-images.githubusercontent.com/8198862/114031905-66b80700-98ae-11eb-8389-7ab603458378.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-814875224


   ![image](https://user-images.githubusercontent.com/5441976/113866359-d01d1480-97df-11eb-8ec5-ee21f9955669.png)
   
   I want to send a warning to your analysis, this graph shows, you used 409M memory for 8m SegmentRef instances. Unless you could confirm, you have so many RPCs or MQ messages consuming at this point, then you are facing plugin bug and memory leak. Rather than any design or code perspective issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-814871986


   Let's talk about this from a different perspective.
   > So how SkyWalking Java Agent can do in such extreme scenarios? Because the application availability is very important, all of us won't hope the APM instrument occupies a lot of memory.
   
   SkyWalking agent would provide a unlimited size of spans per segment, you could find the configuration in the agent, it would be 300 spans per segment mostly.
   
   > I wonder why put the segment in the ring buffer, could we put the span? I don't familiar with the Segment design purpose.
   
   If you don't have this concept, you can't tell the links(metrics) between entry span and exit spans, such as one day, we may need N database operations per HTTP request. Also, your codes actually cache much more data than we did for tracing the context.
   **Unless, like in your case**, I would say you must put the parameter collection by yourself. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] zifeihan edited a comment on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
zifeihan edited a comment on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815556702






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] WildWolfBang edited a comment on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
WildWolfBang edited a comment on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815501380


   > I want to send a warning to your analysis, this graph shows, you used 409M memory for 8m SegmentRef instances. 
   
   ![image](https://user-images.githubusercontent.com/17874410/113977551-559ed400-9875-11eb-927f-d0bb266db0fb.png)
   ![image](https://user-images.githubusercontent.com/17874410/113976498-d0ff8600-9873-11eb-8ec1-1296fd987699.png)
   
     @wu-sheng Hi,I have noticed that "dominat_tree graph" shows `private List<TraceSegmentRef> refs` contains about 511360 enties in one TheadLocal, so I expand the "Class Name" until "java.lang.string" or "char[]" to confirm real memory,then it occurs "Shallow Heap" equals "Retained Heap". The list actually used about 356m in one TheadLocal.The situation is same to other dubbo theads that sorted by percentage.
   The docs  said `private List<TraceSegmentRef> refs` used to link multi parents trace segments, the segment faces hundreds of thousands parents is unusually unless an loop occurs.
   
   `
   public class TraceSegment {
       private String traceSegmentId;
       /**
        * The refs of parent trace segments, except the primary one. For most RPC call, {@link #refs} contains only one
        * element, but if this segment is a start span of batch process, the segment faces multi parents, at this moment,
        * we use this {@code #refs} to link them.
        * <p>
        * This field will not be serialized. Keeping this field is only for quick accessing.
        */
       private List<TraceSegmentRef> refs;
   ...
   }
   
   `
   
   https://dzone.com/articles/eclipse-mat-shallow-heap-retained-heap
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong edited a comment on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong edited a comment on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815808429


   @wu-sheng 
   I found that there is no source code read `TraceSegmentRef`, only the test code read it.
   The agent adds `TraceSegmentRef` in the JVM from each request, do we still need `TraceSegmentRef` now? Only for test code?
   
   ![image](https://user-images.githubusercontent.com/8198862/114031469-fc9f6200-98ad-11eb-9c35-36859561f3b6.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815808429


   @wu-sheng 
   I found that there is no source code read `TraceSegmentRef`, only the test code read it.
   the agent add `TraceSegmentRef` in the JVM from each request, do we still need `TraceSegmentRef` now? Only for test code?
   
   ![image](https://user-images.githubusercontent.com/8198862/114031469-fc9f6200-98ad-11eb-9c35-36859561f3b6.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815617426


   If this problem is about plugins, let me provide some information here, we use 44 official 8.4.0-plugins and `apm-hbase-1.x-2.x-plugin-8.4.0.jar`. 
   
   We cherry-pick the commit about HBase from the current master. relate #6577 
   
   These are all plugins we use:
   ```
   apm-dubbo-2.7.x-plugin-8.4.0.jar
   apm-dubbo-plugin-8.4.0.jar
   apm-elasticsearch-5.x-plugin-8.4.0.jar
   apm-elasticsearch-6.x-plugin-8.4.0.jar
   apm-hbase-1.x-2.x-plugin-8.4.0.jar
   apm-httpasyncclient-4.x-plugin-8.4.0.jar
   apm-httpclient-3.x-plugin-8.4.0.jar
   apm-httpClient-4.x-plugin-8.4.0.jar
   apm-jdbc-commons-8.4.0.jar
   apm-jedis-2.x-plugin-8.4.0.jar
   apm-jetty-client-9.0-plugin-8.4.0.jar
   apm-jetty-client-9.x-plugin-8.4.0.jar
   apm-jetty-server-9.x-plugin-8.4.0.jar
   apm-kafka-commons-8.4.0.jar
   apm-kafka-plugin-8.4.0.jar
   apm-lettuce-5.x-plugin-8.4.0.jar
   apm-mongodb-3.x-plugin-8.4.0.jar
   apm-mongodb-4.x-plugin-8.4.0.jar
   apm-mysql-5.x-plugin-8.4.0.jar
   apm-mysql-6.x-plugin-8.4.0.jar
   apm-mysql-8.x-plugin-8.4.0.jar
   apm-mysql-commons-8.4.0.jar
   apm-okhttp-3.x-plugin-8.4.0.jar
   apm-redisson-3.x-plugin-8.4.0.jar
   apm-resttemplate-4.3.x-plugin-8.4.0.jar
   apm-rocketmq-4.x-plugin-8.4.0.jar
   apm-sharding-sphere-4.1.0-plugin-8.4.0.jar
   apm-spring-async-annotation-plugin-8.4.0.jar
   apm-spring-cloud-feign-1.x-plugin-8.4.0.jar
   apm-spring-cloud-feign-2.x-plugin-8.4.0.jar
   apm-spring-concurrent-util-4.x-plugin-8.4.0.jar
   apm-spring-core-patch-8.4.0.jar
   apm-spring-kafka-1.x-plugin-8.4.0.jar
   apm-spring-kafka-2.x-plugin-8.4.0.jar
   apm-springmvc-annotation-3.x-plugin-8.4.0.jar
   apm-springmvc-annotation-4.x-plugin-8.4.0.jar
   apm-springmvc-annotation-5.x-plugin-8.4.0.jar
   apm-springmvc-annotation-commons-8.4.0.jar
   apm-spring-scheduled-annotation-plugin-8.4.0.jar
   apm-vertx-core-3.x-plugin-8.4.0.jar
   dubbo-2.7.x-conflict-patch-8.4.0.jar
   dubbo-conflict-patch-8.4.0.jar
   spring-commons-8.4.0.jar
   spring-webflux-5.x-webclient-plugin-8.4.0.jar
   tomcat-7.x-8.x-plugin-8.4.0.jar
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] WildWolfBang edited a comment on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
WildWolfBang edited a comment on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815501380


   > I want to send a warning to your analysis, this graph shows, you used 409M memory for 8m SegmentRef instances. 
   
   ![image](https://user-images.githubusercontent.com/17874410/113977551-559ed400-9875-11eb-927f-d0bb266db0fb.png)
   ![image](https://user-images.githubusercontent.com/17874410/113976498-d0ff8600-9873-11eb-8ec1-1296fd987699.png)
   
     @wu-sheng Hi,I have noticed that "dominat_tree graph" shows `private List<TraceSegmentRef> refs` contains about 511360 enties in one TheadLocal, so I expand the "Class Name" until "java.lang.string" or "char[]" to confirm real memory,then it occurs "Shallow Heap" equals "Retained Heap". The list actually used about 356M in one TheadLocal.The situation is same to other dubbo theads that sorted by percentage.
   The docs  said `private List<TraceSegmentRef> refs` used to link multi parents trace segments, the segment faces hundreds of thousands parents is unusually unless an loop occurs.
   
   `
   public class TraceSegment {
       private String traceSegmentId;
       /**
        * The refs of parent trace segments, except the primary one. For most RPC call, {@link #refs} contains only one
        * element, but if this segment is a start span of batch process, the segment faces multi parents, at this moment,
        * we use this {@code #refs} to link them.
        * <p>
        * This field will not be serialized. Keeping this field is only for quick accessing.
        */
       private List<TraceSegmentRef> refs;
   ...
   }
   
   `
   
   https://dzone.com/articles/eclipse-mat-shallow-heap-retained-heap
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] WildWolfBang edited a comment on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
WildWolfBang edited a comment on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-815501380


   > I want to send a warning to your analysis, this graph shows, you used 409M memory for 8m SegmentRef instances. 
   
   ![image](https://user-images.githubusercontent.com/17874410/113977551-559ed400-9875-11eb-927f-d0bb266db0fb.png)
   ![image](https://user-images.githubusercontent.com/17874410/113976498-d0ff8600-9873-11eb-8ec1-1296fd987699.png)
   
     @wu-sheng Hi,I have noticed that "dominat_tree graph" shows `private List<TraceSegmentRef> refs` contains about 511360 entries in one TheadLocal, so I expand the "Class Name" until "java.lang.string" or "char[]" to confirm real memory,then it occurs "Shallow Heap" equals "Retained Heap". The list actually used about 356M in one TheadLocal.The situation is the same to other Dubbo threads that are sorted by percentage.
   The docs said `private List<TraceSegmentRef> refs` used to link multi parents trace segments, the segment faces hundreds of thousands parents is unusual unless a loop occurs.
   
   `
   public class TraceSegment {
       private String traceSegmentId;
       /**
        * The refs of parent trace segments, except the primary one. For most RPC call, {@link #refs} contains only one
        * element, but if this segment is a start span of batch process, the segment faces multi parents, at this moment,
        * we use this {@code #refs} to link them.
        * <p>
        * This field will not be serialized. Keeping this field is only for quick accessing.
        */
       private List<TraceSegmentRef> refs;
   ...
   }
   
   `
   
   https://dzone.com/articles/eclipse-mat-shallow-heap-retained-heap
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] nisiyong commented on issue #6703: The Java Agent may cause frequent GC or OOM in extreme scenarios

Posted by GitBox <gi...@apache.org>.
nisiyong commented on issue #6703:
URL: https://github.com/apache/skywalking/issues/6703#issuecomment-817859169


   Thanks for @zifeihan help. We found ERROR in some agent logs. And this ERROR due to the missing `apm-httpclient-commons-8.4.0.jar` in the agent plugins folder.
   ```
   ERROR 2021-04-02 18:16:42:833 ConsumeMessageThread_4 InstMethodsInter : class[class org.apache.http.impl.client.InternalHttpClient] after metho
   d[doExecute] intercept failure
   java.lang.NoClassDefFoundError: org/apache/skywalking/apm/plugin/httpclient/HttpClientPluginConfig$Plugin$HttpClient
           at org.apache.skywalking.apm.plugin.httpClient.v4.HttpClientExecuteInterceptor.afterMethod(HttpClientExecuteInterceptor.java:98)
           at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:97)
           at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java)
           at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
           at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
           at io.searchbox.client.http.JestHttpClient.executeRequest(JestHttpClient.java:136)
           at io.searchbox.client.http.JestHttpClient.execute(JestHttpClient.java:70)
           at io.searchbox.client.http.JestHttpClient.execute(JestHttpClient.java:63)
   ```
   We use the Maven Profile build agent by ourselves, and follow things make this ERROR:
   - Missing `apm-httpclient-commons` module in the build maven reactor. Because I am not aware there has a dependency relationship between plugins.
   - Forgot to modify the version, still use `8.4.0`.Build successful because maven pulls the dependency `apm-httpclient-commons` from the central repo.
   
   Update version to `8.4.0.1`, it builds failed, after adding the module  `apm-httpclient-commons`, it builds successfully.
   We will deploy the new agent this week and will dump the business application heap again. Let see how it performs after using the new agent. I will record the final result here in few days.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org