You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@skywalking.apache.org by 吴延涛 <wu...@126.com> on 2019/08/06 12:31:24 UTC

OAP server's performance issue

Hi,
      Wu Sheng, I am a developer of Tongcheng. I find some performance issue when I use Skywalking in production environment.
      My Skywalking version is 6.0.0-GA. I find three issues:
      1.Skywalking UI will not obtain trace data after the OAP server runs for a period of time and ES does't have data also. But reboot 
the OAP server, SW UI can obatin the data. I dump the thread when the OAP server does't work. After I analysis the dump file, 
I find the GRPC's thread that put the trace data into buffer is in sleeping state and the persistence worker which consume the buffer also blocked。
So I think the buffer has problem, because the buffer producer and the buffer consumer is in two threads, when the producer save data 
into buffer the consumer does't know the buffer has data, which is the visibility issue in multithread。So I change the buffer array into 
ArrayBlockingQueue like this:
After this change everything runs well, besides the performance improved obviously.
      2.The GRPC server is using the JDK CachedThreadPool,the thread number of which increase with requests,so when plenty of requests is coming and  OAP server 
is not enough or ES is not enough,the GRPC thead number can reach to a big number,then the OAP server crashed。So I change the code like this:
     3.We use consul as register center,the consul health check with GRPC way is not stable,so I change the way to tcp:


Above is the issue I encounter in production environment。




BRs,
Wu Yantao(吴延涛)




Re: OAP server's performance issue

Posted by Sheng Wu <wu...@gmail.com>.
Hi

Due to you keep failing in sending attachment, please submit issue at GitHub. We could discuss there too.

Sheng Wu
Apache Skywalking, ShardingSphere, Zipkin



> 在 2019年8月7日,上午11:12,吴延涛 <wu...@126.com> 写道:
> 
> 
> FYI.
> 
> 
> 
> 
> At 2019-08-07 10:56:47, "吴延涛" <wu...@126.com> wrote:
> 
> Hi All,
> 
>        See the graphs in attachments.
>        For the first issue, I want to have a supplement. The root cause is the visibility issue in multithread programming.
> According to JMM(Java Memory Model),  when one thread modify the buffer, the other thread can't obtain the modified
> result immediately without any synchronized measure. So I change the buffer array to the JDK ArrayBlockingQueue, 
> use blocking queue to guarantee the visibility between the different threads.
>       In the code I use none blocking method,if you want to use blocking way,you should change the offer method to put method.
> 
> 
> 
> 
> At 2019-08-06 20:39:55, "Sheng Wu" <wu...@gmail.com> wrote:
> >Hi
> >
> >Thanks for the feedback. We can't see the graphs you posted, could you send
> >them in attachments?
> >
> >Sheng Wu 吴晟
> >
> >Apache SkyWalking, Apache ShardingSphere(Incubating), Zipkin
> >Twitter, wusheng1108
> >
> >
> >吴延涛 <wu...@126.com> 于2019年8月6日周二 下午8:31写道:
> >
> >> Hi,
> >>       Wu Sheng, I am a developer of Tongcheng. I find some performance
> >> issue when I use Skywalking in production environment.
> >>       My Skywalking version is 6.0.0-GA. I find three issues:
> >>       1.Skywalking UI will not obtain trace data after the OAP server runs
> >> for a period of time and ES does't have data also. But reboot
> >> the OAP server, SW UI can obatin the data. I dump the thread when the OAP
> >> server does't work. After I analysis the dump file,
> >> I find the GRPC's thread that put the trace data into buffer is in
> >> sleeping state and the persistence worker which consume the buffer also
> >> blocked。
> >> So I think the buffer has problem, because the buffer producer and the
> >> buffer consumer is in two threads, when the producer save data
> >> into buffer the consumer does't know the buffer has data, which is the
> >> visibility issue in multithread。So I change the buffer array into
> >> ArrayBlockingQueue like this:
> >> After this change everything runs well, besides the performance
> >> improved obviously.
> >>       2.The GRPC server is using the JDK CachedThreadPool,the thread
> >> number of which increase with requests,so when plenty of requests is coming
> >> and  OAP server
> >> is not enough or ES is not enough,the GRPC thead number can reach to a big
> >> number,then the OAP server crashed。So I change the code like this:
> >>      3.We use consul as register center,the consul health check with GRPC
> >> way is not stable,so I change the way to tcp:
> >>
> >> Above is the issue I encounter in production environment。
> >>
> >>
> >> BRs,
> >> Wu Yantao(吴延涛)
> >>
> >>
> >>
> >>
> >>
> >>
> 
> 
>  
> 
> 
>  


Re:Re:Re: OAP server's performance issue

Posted by 吴延涛 <wu...@126.com>.
FYI.





At 2019-08-07 10:56:47, "吴延涛" <wu...@126.com> wrote:



Hi All,


       See the graphs in attachments.
For the first issue, I want to have a supplement. The root cause is the visibility issue in multithread programming.
According to JMM(Java Memory Model), when one thread modify the buffer, the other thread can't obtain the modified
result immediately without any synchronized measure. So I change the buffer array to the JDK ArrayBlockingQueue,
use blocking queue to guarantee the visibility between the different threads.
In the code I use none blocking method,if you want to use blocking way,you should change the offer method to put method.






At 2019-08-06 20:39:55, "Sheng Wu" <wu...@gmail.com> wrote:
>Hi
>
>Thanks for the feedback. We can't see the graphs you posted, could you send
>them in attachments?
>
>Sheng Wu 吴晟
>
>Apache SkyWalking, Apache ShardingSphere(Incubating), Zipkin
>Twitter, wusheng1108
>
>
>吴延涛 <wu...@126.com> 于2019年8月6日周二 下午8:31写道:
>
>> Hi,
>>       Wu Sheng, I am a developer of Tongcheng. I find some performance
>> issue when I use Skywalking in production environment.
>>       My Skywalking version is 6.0.0-GA. I find three issues:
>>       1.Skywalking UI will not obtain trace data after the OAP server runs
>> for a period of time and ES does't have data also. But reboot
>> the OAP server, SW UI can obatin the data. I dump the thread when the OAP
>> server does't work. After I analysis the dump file,
>> I find the GRPC's thread that put the trace data into buffer is in
>> sleeping state and the persistence worker which consume the buffer also
>> blocked。
>> So I think the buffer has problem, because the buffer producer and the
>> buffer consumer is in two threads, when the producer save data
>> into buffer the consumer does't know the buffer has data, which is the
>> visibility issue in multithread。So I change the buffer array into
>> ArrayBlockingQueue like this:
>> After this change everything runs well, besides the performance
>> improved obviously.
>>       2.The GRPC server is using the JDK CachedThreadPool,the thread
>> number of which increase with requests,so when plenty of requests is coming
>> and  OAP server
>> is not enough or ES is not enough,the GRPC thead number can reach to a big
>> number,then the OAP server crashed。So I change the code like this:
>>      3.We use consul as register center,the consul health check with GRPC
>> way is not stable,so I change the way to tcp:
>>
>> Above is the issue I encounter in production environment。
>>
>>
>> BRs,
>> Wu Yantao(吴延涛)
>>
>>
>>
>>
>>
>>





 

Re:Re: OAP server's performance issue

Posted by 吴延涛 <wu...@126.com>.

Hi All,


       See the graphs in attachments.
For the first issue, I want to have a supplement. The root cause is the visibility issue in multithread programming.
According to JMM(Java Memory Model), when one thread modify the buffer, the other thread can't obtain the modified
result immediately without any synchronized measure. So I change the buffer array to the JDK ArrayBlockingQueue,
use blocking queue to guarantee the visibility between the different threads.
In the code I use none blocking method,if you want to use blocking way,you should change the offer method to put method.






At 2019-08-06 20:39:55, "Sheng Wu" <wu...@gmail.com> wrote:
>Hi
>
>Thanks for the feedback. We can't see the graphs you posted, could you send
>them in attachments?
>
>Sheng Wu 吴晟
>
>Apache SkyWalking, Apache ShardingSphere(Incubating), Zipkin
>Twitter, wusheng1108
>
>
>吴延涛 <wu...@126.com> 于2019年8月6日周二 下午8:31写道:
>
>> Hi,
>>       Wu Sheng, I am a developer of Tongcheng. I find some performance
>> issue when I use Skywalking in production environment.
>>       My Skywalking version is 6.0.0-GA. I find three issues:
>>       1.Skywalking UI will not obtain trace data after the OAP server runs
>> for a period of time and ES does't have data also. But reboot
>> the OAP server, SW UI can obatin the data. I dump the thread when the OAP
>> server does't work. After I analysis the dump file,
>> I find the GRPC's thread that put the trace data into buffer is in
>> sleeping state and the persistence worker which consume the buffer also
>> blocked。
>> So I think the buffer has problem, because the buffer producer and the
>> buffer consumer is in two threads, when the producer save data
>> into buffer the consumer does't know the buffer has data, which is the
>> visibility issue in multithread。So I change the buffer array into
>> ArrayBlockingQueue like this:
>> After this change everything runs well, besides the performance
>> improved obviously.
>>       2.The GRPC server is using the JDK CachedThreadPool,the thread
>> number of which increase with requests,so when plenty of requests is coming
>> and  OAP server
>> is not enough or ES is not enough,the GRPC thead number can reach to a big
>> number,then the OAP server crashed。So I change the code like this:
>>      3.We use consul as register center,the consul health check with GRPC
>> way is not stable,so I change the way to tcp:
>>
>> Above is the issue I encounter in production environment。
>>
>>
>> BRs,
>> Wu Yantao(吴延涛)
>>
>>
>>
>>
>>
>>

Re: OAP server's performance issue

Posted by Sheng Wu <wu...@gmail.com>.
Hi

Thanks for the feedback. We can't see the graphs you posted, could you send
them in attachments?

Sheng Wu 吴晟

Apache SkyWalking, Apache ShardingSphere(Incubating), Zipkin
Twitter, wusheng1108


吴延涛 <wu...@126.com> 于2019年8月6日周二 下午8:31写道:

> Hi,
>       Wu Sheng, I am a developer of Tongcheng. I find some performance
> issue when I use Skywalking in production environment.
>       My Skywalking version is 6.0.0-GA. I find three issues:
>       1.Skywalking UI will not obtain trace data after the OAP server runs
> for a period of time and ES does't have data also. But reboot
> the OAP server, SW UI can obatin the data. I dump the thread when the OAP
> server does't work. After I analysis the dump file,
> I find the GRPC's thread that put the trace data into buffer is in
> sleeping state and the persistence worker which consume the buffer also
> blocked。
> So I think the buffer has problem, because the buffer producer and the
> buffer consumer is in two threads, when the producer save data
> into buffer the consumer does't know the buffer has data, which is the
> visibility issue in multithread。So I change the buffer array into
> ArrayBlockingQueue like this:
> After this change everything runs well, besides the performance
> improved obviously.
>       2.The GRPC server is using the JDK CachedThreadPool,the thread
> number of which increase with requests,so when plenty of requests is coming
> and  OAP server
> is not enough or ES is not enough,the GRPC thead number can reach to a big
> number,then the OAP server crashed。So I change the code like this:
>      3.We use consul as register center,the consul health check with GRPC
> way is not stable,so I change the way to tcp:
>
> Above is the issue I encounter in production environment。
>
>
> BRs,
> Wu Yantao(吴延涛)
>
>
>
>
>
>