You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@eagle.apache.org by "zhangchi@nucc.com" <zh...@nucc.com> on 2019/05/17 11:49:30 UTC

性能问题求助

我遇到了一个性能问题，当spout从多个kafka topic订阅消息时，将存在性能问题。每个topic无法达到40000TPS的性能瓶颈。我分析是一下代码的问题，发现这段循环会随着topic数量的增加而线性的下降。请问在设计的时候是否有过这方面的考虑，应该如何解决？



张驰
组织结构：应用技术部-渠道管理组
联系电话：18511460690

Re: Re: 性能问题求助

Posted by Zhao Qingwen <qi...@gmail.com>.

Hi Chi,

1. "topology.numOfSpoutTasks"  is specified when you create the alert
topology.
2. In eagle, all the alert topologies should be shared by all policies. You
cannot specify a specific alert topology for a policy. (At least it worked
in that way one year ago)
    There are some rules when it assigns a policy to the working
slots/queue/alert bolts. Please refer to

https://github.com/apache/eagle/blob/master/eagle-core/eagle-alert-parent/eagle-alert/alert-coordinator/src/main/java/org/apache/eagle/alert/coordinator/impl/GreedyPolicyScheduler.java#L234

    The number of policies an alert topology could serve is limited by
"topology.numOfAlertTasks". In other words, total policyParallelismHint <=
"topology.numOfAlertTasks"
3. Back to your question, create mutliple alert topologies with limited
alert tasks.

Hope it could help :)

Best Regards,
Qingwen Zhao | 赵晴雯





Edward Zhang <yo...@gmail.com> 于2019年5月23日周四 上午6:38写道：

> Hi Zhang Chi，
>
> Thanks for your email. There is a configuration "topology.numOfSpoutTasks",
> which can be used to specify number of tasks in spout node. Sorry, I don't
> remember where to specify it.
> Maybe that can be set in the topology config before you start the
> application topology.
>
> The original design allows that policies can be divided and deployed to
> different topologies for the same input stream, see
> https://cwiki.apache.org/confluence/display/EAG/Alert+Engine+Design?preview=/65877337/65877340/AlertEngine-251016-2018-736.pdf
> .
> But seems this scalability is not transparent to application, i.e. user
> can define policy on different topologies which share the same input
> stream, but Eagle can not automatically distribute policy to different
> topologies.
>
> @hao <ha...@apache.org>  @Su Ralph (JIRA) <ji...@apache.org>  can you please
> confirm my above statements.
>
> Thanks Chi again for digging into those problems.
>
> Thanks
> Edward
>
>
> On Tue, May 21, 2019 at 9:19 PM zhangchi@nucc.com <zh...@nucc.com>
> wrote:
>
>> 亲爱的Edward：
>>
>> 上次您说过，为了解决性能问题可以把不同的topic放到不同的topology中，但是我现在遇到一个问题：如何让我的策略在指定的topology中运行能。
>>
>> ------------------------------
>> *张驰*
>> 组织结构：应用技术部-渠道管理组
>> 联系电话：18511460690
>>
>>
>> *发件人：* zhangchi@nucc.com
>> *发送时间：* 2019-05-18 19:23
>> *收件人：* Edward Zhang <yo...@gmail.com>
>> *主题：* Re: Re: 性能问题求助
>> Thank you Edward:
>>    Is there any way to expand the parallelism of the spout, such as
>> expanding the partition of the kafka topic, I try to increase the number of
>> tasks in the spout node, but I found that it has no effect and the
>> calculated data is not accurate enough. What is the reason?
>>     Thank you for your reply, I hope that you will be happy every day.
>>
>> ------------------------------
>> *张驰*
>> 组织结构：应用技术部-渠道管理组
>> 联系电话：18511460690
>>
>>
>> *发件人：* Edward Zhang <yo...@gmail.com>
>> *发送时间：* 2019-05-18 01:03
>> *收件人：* dev <de...@eagle.apache.org>; zhangchi <zh...@nucc.com>
>> *主题：* Re: 性能问题求助
>> Hi ZhangChi,
>>
>> It may have issue if many topics co-exist in the same spout because each
>> nextTuple() may take time. The original design has considered the scenario
>> of many topics, for example it can be solved by deploying multiple storm
>> topology to solve.
>>
>> Can you use more task nodes for the spout if your data has been
>> partitioned well?
>>
>> Thanks
>> Edward
>>
>> On Fri, May 17, 2019 at 9:04 AM zhangchi@nucc.com <zh...@nucc.com>
>> wrote:
>>
>>> 我遇到了一个性能问题，当spout从多个kafka
>>> topic订阅消息时，将存在性能问题。每个topic无法达到40000TPS的性能瓶颈。我分析是一下代码的问题，发现这段循环会随着topic数量的增加而线性的下降。请问在设计的时候是否有过这方面的考虑，应该如何解决？
>>>
>>> ------------------------------
>>> *张驰*
>>> 组织结构：应用技术部-渠道管理组
>>> 联系电话：18511460690
>>>
>>

Re: Re: 性能问题求助

Posted by Edward Zhang <yo...@gmail.com>.

Hi Zhang Chi，

Thanks for your email. There is a configuration "topology.numOfSpoutTasks",
which can be used to specify number of tasks in spout node. Sorry, I don't
remember where to specify it.
Maybe that can be set in the topology config before you start the
application topology.

The original design allows that policies can be divided and deployed to
different topologies for the same input stream, see
https://cwiki.apache.org/confluence/display/EAG/Alert+Engine+Design?preview=/65877337/65877340/AlertEngine-251016-2018-736.pdf
.
But seems this scalability is not transparent to application, i.e. user can
define policy on different topologies which share the same input stream,
but Eagle can not automatically distribute policy to different topologies.

@hao <ha...@apache.org>  @Su Ralph (JIRA) <ji...@apache.org>  can you please
confirm my above statements.

Thanks Chi again for digging into those problems.

Thanks
Edward

On Tue, May 21, 2019 at 9:19 PM zhangchi@nucc.com <zh...@nucc.com> wrote:

> 亲爱的Edward：
>
> 上次您说过，为了解决性能问题可以把不同的topic放到不同的topology中，但是我现在遇到一个问题：如何让我的策略在指定的topology中运行能。
>
> ------------------------------
> *张驰*
> 组织结构：应用技术部-渠道管理组
> 联系电话：18511460690
>
>
> *发件人：* zhangchi@nucc.com
> *发送时间：* 2019-05-18 19:23
> *收件人：* Edward Zhang <yo...@gmail.com>
> *主题：* Re: Re: 性能问题求助
> Thank you Edward:
>    Is there any way to expand the parallelism of the spout, such as
> expanding the partition of the kafka topic, I try to increase the number of
> tasks in the spout node, but I found that it has no effect and the
> calculated data is not accurate enough. What is the reason?
>     Thank you for your reply, I hope that you will be happy every day.
>
> ------------------------------
> *张驰*
> 组织结构：应用技术部-渠道管理组
> 联系电话：18511460690
>
>
> *发件人：* Edward Zhang <yo...@gmail.com>
> *发送时间：* 2019-05-18 01:03
> *收件人：* dev <de...@eagle.apache.org>; zhangchi <zh...@nucc.com>
> *主题：* Re: 性能问题求助
> Hi ZhangChi,
>
> It may have issue if many topics co-exist in the same spout because each
> nextTuple() may take time. The original design has considered the scenario
> of many topics, for example it can be solved by deploying multiple storm
> topology to solve.
>
> Can you use more task nodes for the spout if your data has been
> partitioned well?
>
> Thanks
> Edward
>
> On Fri, May 17, 2019 at 9:04 AM zhangchi@nucc.com <zh...@nucc.com>
> wrote:
>
>> 我遇到了一个性能问题，当spout从多个kafka
>> topic订阅消息时，将存在性能问题。每个topic无法达到40000TPS的性能瓶颈。我分析是一下代码的问题，发现这段循环会随着topic数量的增加而线性的下降。请问在设计的时候是否有过这方面的考虑，应该如何解决？
>>
>> ------------------------------
>> *张驰*
>> 组织结构：应用技术部-渠道管理组
>> 联系电话：18511460690
>>
>

Re: 性能问题求助

Posted by Edward Zhang <yo...@gmail.com>.

Hi ZhangChi,

It may have issue if many topics co-exist in the same spout because each
nextTuple() may take time. The original design has considered the scenario
of many topics, for example it can be solved by deploying multiple storm
topology to solve.

Can you use more task nodes for the spout if your data has been partitioned
well?

Thanks
Edward

On Fri, May 17, 2019 at 9:04 AM zhangchi@nucc.com <zh...@nucc.com> wrote:

> 我遇到了一个性能问题，当spout从多个kafka
> topic订阅消息时，将存在性能问题。每个topic无法达到40000TPS的性能瓶颈。我分析是一下代码的问题，发现这段循环会随着topic数量的增加而线性的下降。请问在设计的时候是否有过这方面的考虑，应该如何解决？
>
> ------------------------------
> *张驰*
> 组织结构：应用技术部-渠道管理组
> 联系电话：18511460690
>