You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by Yongming Zhao <mi...@gmail.com> on 2013/11/21 07:45:31 UTC

the cluster refine codes in on refine_cluster branch

20 hours ago, Weijin pushed out our first effort of forward porting our cluster refine codes, in this big fat patch, we have refined the cluster communication in the message level, to archive better performance.

what we do:
	• make cluster a pure message driven layer, no more vc splice on each side
	• cleanup the msg encapsulation and callback implements
	• modified the cache cluster interface
due to the big change we made, there is something we changed in the cluster:
	* load monitor
	* hostdb cluster interface
as our main platform is Linux, and due to the network codes we can not reuse, we have made some dirty codes into the new cluster, that will need more work to get clean, to make it multi-platform aware and clean.

anyway, it is out, please join us on the hacking and testing.

those codes performs >4G/box traffic in our network, and there is no limit in cluster performance anymore

please refer to the codes and wiki:
codes in:
https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;a=shortlog;h=refs/heads/refine_cluster
wiki at:
https://cwiki.apache.org/confluence/display/TS/Clustering
jira at:
https://issues.apache.org/jira/browse/TS-2005

the patch is a joint effort of WeiJin and YuQing
thanks



Yongming Zhao
赵永明
aka 永豪 yonghao@taobao.com


Re: the cluster refine codes in on refine_cluster branch

Posted by Yongming Zhao <mi...@gmail.com>.
https://cwiki.apache.org/confluence/display/TS/Clustering

here is the long waiting documents on how to config and manage the new cluster, and I will translate them all into English.

the merge codes may have some issue, on building and performance, please don’t put in production anyway.

thanks


在 2013年11月21日,下午2:45,Yongming Zhao <mi...@gmail.com> 写道:

> 20 hours ago, Weijin pushed out our first effort of forward porting our cluster refine codes, in this big fat patch, we have refined the cluster communication in the message level, to archive better performance.
> 
> what we do:
> 	• make cluster a pure message driven layer, no more vc splice on each side
> 	• cleanup the msg encapsulation and callback implements
> 	• modified the cache cluster interface
> due to the big change we made, there is something we changed in the cluster:
> 	* load monitor
> 	* hostdb cluster interface
> as our main platform is Linux, and due to the network codes we can not reuse, we have made some dirty codes into the new cluster, that will need more work to get clean, to make it multi-platform aware and clean.
> 
> anyway, it is out, please join us on the hacking and testing.
> 
> those codes performs >4G/box traffic in our network, and there is no limit in cluster performance anymore
> 
> please refer to the codes and wiki:
> codes in:
> https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;a=shortlog;h=refs/heads/refine_cluster
> wiki at:
> https://cwiki.apache.org/confluence/display/TS/Clustering
> jira at:
> https://issues.apache.org/jira/browse/TS-2005
> 
> the patch is a joint effort of WeiJin and YuQing
> thanks
> 
> 
> 
> Yongming Zhao
> 赵永明
> aka 永豪 yonghao@taobao.com
> 


Re: the cluster refine codes in on refine_cluster branch

Posted by Yongming Zhao <mi...@gmail.com>.
yeah, that is the origin docs for cluster, I will update all the config changes we done into the wiki tomorrow, for the refined cluster.

thanks

在 2013年11月21日,下午6:01,Igor Galić <i....@brainsware.org> 写道:

> 
> Thanks Yongming, but do you have archive doc/guide to set up cluster?
> 
> https://trafficserver.readthedocs.org/en/latest/admin/cluster-howto.en.html
> 
> 
> -- 
> Igor Galić
> 
> Tel: +43 (0) 664 886 22 883
> Mail: i.galic@brainsware.org
> URL: http://brainsware.org/
> GPG: 8716 7A9F 989B ABD5 100F  4008 F266 55D6 2998 1641
> 





Re: the cluster refine codes in on refine_cluster branch

Posted by Igor Galić <i....@brainsware.org>.
----- Original Message -----

> Thanks Yongming, but do you have archive doc/guide to set up cluster?

https://trafficserver.readthedocs.org/en/latest/admin/cluster-howto.en.html 

-- 
Igor Galić 

Tel: +43 (0) 664 886 22 883 
Mail: i.galic@brainsware.org 
URL: http://brainsware.org/ 
GPG: 8716 7A9F 989B ABD5 100F 4008 F266 55D6 2998 1641 

Re: the cluster refine codes in on refine_cluster branch

Posted by "Neddy, NH. Nam" <na...@nd24.net>.
Thanks Yongming, but do you have archive doc/guide to set up cluster?


On Thu, Nov 21, 2013 at 1:45 PM, Yongming Zhao <mi...@gmail.com> wrote:

> 20 hours ago, Weijin pushed out our first effort of forward porting our
> cluster refine codes, in this big fat patch, we have refined the cluster
> communication in the message level, to archive better performance.
>
> what we do:
>         • make cluster a pure message driven layer, no more vc splice on
> each side
>         • cleanup the msg encapsulation and callback implements
>         • modified the cache cluster interface
> due to the big change we made, there is something we changed in the
> cluster:
>         * load monitor
>         * hostdb cluster interface
> as our main platform is Linux, and due to the network codes we can not
> reuse, we have made some dirty codes into the new cluster, that will need
> more work to get clean, to make it multi-platform aware and clean.
>
> anyway, it is out, please join us on the hacking and testing.
>
> those codes performs >4G/box traffic in our network, and there is no limit
> in cluster performance anymore
>
> please refer to the codes and wiki:
> codes in:
>
> https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;a=shortlog;h=refs/heads/refine_cluster
> wiki at:
> https://cwiki.apache.org/confluence/display/TS/Clustering
> jira at:
> https://issues.apache.org/jira/browse/TS-2005
>
> the patch is a joint effort of WeiJin and YuQing
> thanks
>
>
>
> Yongming Zhao
> 赵永明
> aka 永豪 yonghao@taobao.com
>
>

Re: the cluster refine codes in on refine_cluster branch

Posted by Igor Galić <i....@brainsware.org>.
----- Original Message -----
> 20 hours ago, Weijin pushed out our first effort of forward porting our
> cluster refine codes, in this big fat patch, we have refined the cluster

First of all I want to say a big thanks for this gigantic effort.
Those results look truly impressive.

I do however have some criticism as to how it has been delievered: 

I think it would have been better in a series of patches that show the
evolution. Also, better comments and commit messages explaining the
specific changes that went into each bigger change. That way we can
also split the over-arching change into multiple Jira tickets.
We did this for TS-2281 (Augment config system to use LUA).

A patch this big makes it a lot harder for anyone reviewing the code.
As such, we have to fallback to asking silly questions on the Mailing List ;)

> communication in the message level, to archive better performance.
> 
> what we do:
>     • make cluster a pure message driven layer, no more vc splice on each side

What message protocol are we using here?

>     • cleanup the msg encapsulation and callback implements
>     • modified the cache cluster interface

Does this touch the Cache APIs or the internal Cache on-disk layout?
i.e.: What does it mean to people who do not use the cluster?

> due to the big change we made, there is something we changed in the cluster:
>     * load monitor

Can this be "abstracted" into an API that load-balancer plugins could use?

>     * hostdb cluster interface
> as our main platform is Linux, and due to the network codes we can not reuse,

With small patches going into the branch, this is something other
contributors interested in those platforms could have helped with.

> we have made some dirty codes into the new cluster, that will need more work
> to get clean, to make it multi-platform aware and clean.

With small patches going into the branch, this is something reviewers
could have caught in the process, as the design's implementation unfolds.
 
> anyway, it is out, please join us on the hacking and testing.
>
> those codes performs >4G/box traffic in our network, and there is no limit in
> cluster performance anymore
> 
> please refer to the codes and wiki:
> codes in:
> https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;a=shortlog;h=refs/heads/refine_cluster
> wiki at:
> https://cwiki.apache.org/confluence/display/TS/Clustering
> jira at:
> https://issues.apache.org/jira/browse/TS-2005
> 
> the patch is a joint effort of WeiJin and YuQing
> thanks


-- 
Igor Galić

Tel: +43 (0) 664 886 22 883
Mail: i.galic@brainsware.org
URL: http://brainsware.org/
GPG: 8716 7A9F 989B ABD5 100F  4008 F266 55D6 2998 1641


Re: the cluster refine codes in on refine_cluster branch

Posted by James Peach <jp...@apache.org>.
On Nov 20, 2013, at 10:45 PM, Yongming Zhao <mi...@gmail.com> wrote:

> 20 hours ago, Weijin pushed out our first effort of forward porting our cluster refine codes, in this big fat patch, we have refined the cluster communication in the message level, to archive better performance.
> 
> what we do:
> 	• make cluster a pure message driven layer, no more vc splice on each side
> 	• cleanup the msg encapsulation and callback implements
> 	• modified the cache cluster interface
> due to the big change we made, there is something we changed in the cluster:
> 	* load monitor
> 	* hostdb cluster interface
> as our main platform is Linux, and due to the network codes we can not reuse, we have made some dirty codes into the new cluster, that will need more work to get clean, to make it multi-platform aware and clean.
> 
> anyway, it is out, please join us on the hacking and testing.
> 
> those codes performs >4G/box traffic in our network, and there is no limit in cluster performance anymore
> 
> please refer to the codes and wiki:
> codes in:
> https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;a=shortlog;h=refs/heads/refine_cluster
> wiki at:
> https://cwiki.apache.org/confluence/display/TS/Clustering
> jira at:
> https://issues.apache.org/jira/browse/TS-2005
> 
> the patch is a joint effort of WeiJin and YuQing

Thanks guys, and thanks for writing up the changes. I'll try to review over the next few weeks and ask a lot of questions.

One think I'd appreciate a lot is some guidance on how to test clustering. Do you have advice on setting up a performance benchmark for testing the cluster? Do you have any measurements from your own testing that you can share?

J

Re: the cluster refine codes in on refine_cluster branch

Posted by "Neddy, NH. Nam" <na...@nd24.net>.
Thanks Yongming, but do you have archive doc/guide to set up cluster?


On Thu, Nov 21, 2013 at 1:45 PM, Yongming Zhao <mi...@gmail.com> wrote:

> 20 hours ago, Weijin pushed out our first effort of forward porting our
> cluster refine codes, in this big fat patch, we have refined the cluster
> communication in the message level, to archive better performance.
>
> what we do:
>         • make cluster a pure message driven layer, no more vc splice on
> each side
>         • cleanup the msg encapsulation and callback implements
>         • modified the cache cluster interface
> due to the big change we made, there is something we changed in the
> cluster:
>         * load monitor
>         * hostdb cluster interface
> as our main platform is Linux, and due to the network codes we can not
> reuse, we have made some dirty codes into the new cluster, that will need
> more work to get clean, to make it multi-platform aware and clean.
>
> anyway, it is out, please join us on the hacking and testing.
>
> those codes performs >4G/box traffic in our network, and there is no limit
> in cluster performance anymore
>
> please refer to the codes and wiki:
> codes in:
>
> https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;a=shortlog;h=refs/heads/refine_cluster
> wiki at:
> https://cwiki.apache.org/confluence/display/TS/Clustering
> jira at:
> https://issues.apache.org/jira/browse/TS-2005
>
> the patch is a joint effort of WeiJin and YuQing
> thanks
>
>
>
> Yongming Zhao
> 赵永明
> aka 永豪 yonghao@taobao.com
>
>

Re: the cluster refine codes in on refine_cluster branch

Posted by James Peach <jp...@apache.org>.
On Nov 20, 2013, at 10:45 PM, Yongming Zhao <mi...@gmail.com> wrote:

> 20 hours ago, Weijin pushed out our first effort of forward porting our cluster refine codes, in this big fat patch, we have refined the cluster communication in the message level, to archive better performance.
> 
> what we do:
> 	• make cluster a pure message driven layer, no more vc splice on each side
> 	• cleanup the msg encapsulation and callback implements
> 	• modified the cache cluster interface
> due to the big change we made, there is something we changed in the cluster:
> 	* load monitor
> 	* hostdb cluster interface
> as our main platform is Linux, and due to the network codes we can not reuse, we have made some dirty codes into the new cluster, that will need more work to get clean, to make it multi-platform aware and clean.
> 
> anyway, it is out, please join us on the hacking and testing.
> 
> those codes performs >4G/box traffic in our network, and there is no limit in cluster performance anymore
> 
> please refer to the codes and wiki:
> codes in:
> https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;a=shortlog;h=refs/heads/refine_cluster
> wiki at:
> https://cwiki.apache.org/confluence/display/TS/Clustering
> jira at:
> https://issues.apache.org/jira/browse/TS-2005
> 
> the patch is a joint effort of WeiJin and YuQing

Thanks guys, and thanks for writing up the changes. I'll try to review over the next few weeks and ask a lot of questions.

One think I'd appreciate a lot is some guidance on how to test clustering. Do you have advice on setting up a performance benchmark for testing the cluster? Do you have any measurements from your own testing that you can share?

J

Re: the cluster refine codes in on refine_cluster branch

Posted by Yongming Zhao <mi...@gmail.com>.
https://cwiki.apache.org/confluence/display/TS/Clustering

here is the long waiting documents on how to config and manage the new cluster, and I will translate them all into English.

the merge codes may have some issue, on building and performance, please don’t put in production anyway.

thanks


在 2013年11月21日,下午2:45,Yongming Zhao <mi...@gmail.com> 写道:

> 20 hours ago, Weijin pushed out our first effort of forward porting our cluster refine codes, in this big fat patch, we have refined the cluster communication in the message level, to archive better performance.
> 
> what we do:
> 	• make cluster a pure message driven layer, no more vc splice on each side
> 	• cleanup the msg encapsulation and callback implements
> 	• modified the cache cluster interface
> due to the big change we made, there is something we changed in the cluster:
> 	* load monitor
> 	* hostdb cluster interface
> as our main platform is Linux, and due to the network codes we can not reuse, we have made some dirty codes into the new cluster, that will need more work to get clean, to make it multi-platform aware and clean.
> 
> anyway, it is out, please join us on the hacking and testing.
> 
> those codes performs >4G/box traffic in our network, and there is no limit in cluster performance anymore
> 
> please refer to the codes and wiki:
> codes in:
> https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;a=shortlog;h=refs/heads/refine_cluster
> wiki at:
> https://cwiki.apache.org/confluence/display/TS/Clustering
> jira at:
> https://issues.apache.org/jira/browse/TS-2005
> 
> the patch is a joint effort of WeiJin and YuQing
> thanks
> 
> 
> 
> Yongming Zhao
> 赵永明
> aka 永豪 yonghao@taobao.com
>