You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@shindig.apache.org by Chris Chabot <ch...@xs4all.nl> on 2008/05/09 22:31:28 UTC

Need some advice - features, inline or external?

Hey guys, i could do with some advice.

== the problem ==

In the java version, the features are all parsed and their javascript  
content is loaded into memory. This works on the java side, and gives  
the opportunity to cajol the entire content in one fell swoop, so that  
works great.

Now on the PHP side of things things are a bit difference since PHP  
works in a process-per-request type situation, so parsing the entire  
features structure each request is non-doable, would make any semi  
decent performance impossible to achieve, so instead i process the  
features once and cache the entire resulting structure in cache, thats  
about twice as fast as processing the features structure each request,  
so it's survivable.

Survivable but far from optimal, it's a lot of information to read  
from cache, and a lot of memory consumed (since every process has it's  
own instance, it adds up), so that puts a good bit of pressure on the  
server's IO.

To add some measurability to this, on a quad core @ 3ghz workstation  
this gets me about 420 pages a second with apache bench.

== the solution ? ==

The main problem is the overhead of loading all the features  
javascript each request, this consumes tons of memory and takes loads  
of IO, so the overly obvious solution is to not do this anymore :)

So what would work is that i make all javascript external (<script  
src="...">), generate script tags for each feature (and it's  
dependencies) and modify the javascript handler (/gadgets/js) to only  
output the javascript for the requested feature.

There are a few possible downsides i can identify though:

More requests, one per feature, however with an expiration data in the  
far far future and a cache busting version param, this should be  
negligible.. besides the amount of bandwidth used would go down  
tremendously (a few small kb for a gadget instead of 180kb or so  
because of all the inline javascript), so the combined browser side  
caching of savings on bandwidth / time to transfer all this info ...  
should actually have a positive effect right?

The second risk is that it could add some perceived latency since the  
gadgets.config.init() and the onLoad handlers can't be called until  
the document has completed, which includes handling the javascript  
files, and whatever external resources the gadget includes..  this is  
probably the biggest problem of this solution.

And finally, it would make cajoling impossible probably ... but that  
doesn't concern me so much right now since we don't have a mechanism  
for php shindig to do that anyhow :)

With that 'small' modification, the pages/second shoots up to 630, a  
very significant increased, and that's with just a few hacks and not a  
proper implementation of this option.

So the performance gains seem significant enough to consider this,   
both in server load, pages/second and bandwidth saved.. however as  
mentioned, there's a few risks involved too.

What do you all reckon would be the right solution here? Hope i can  
get your opinions on what would be the better choice here since I'm a  
bit torn between the two.

	-- Chris

Re: Need some advice - features, inline or external?

Posted by Brian Eaton <be...@google.com>.

On Fri, May 9, 2008 at 4:57 PM, Chris Chabot <ch...@xs4all.nl> wrote:
>
>  Plain old requests / second without anything else
>

In that case it sounds like your 630 vs 420 number involves the server
skipping some steps.  If you include the extra requests necessary to
fetch the javascript (putting more load on the server), I think you
might find that the overall result is slower.  Or maybe not.  As you
pointed out in your original note, browser and ISP caches have a role
to play

Re: Need some advice - features, inline or external?

Posted by Chris Chabot <ch...@xs4all.nl>.

> Yes, I know it's a purely synthetic number with limited real world
> relevance, but I'm still curious about what you're counting as a page.
> =)  Is it requests/second, or does each page in your metric include
> several requests?

Plain old requests / second without anything else

Re: Need some advice - features, inline or external?

Posted by Brian Eaton <be...@google.com>.

On Fri, May 9, 2008 at 4:44 PM, Chris Chabot <ch...@xs4all.nl> wrote:
> > About the 420 pages/sec vs 630 pages/sec number... what are you
> > counting as a page?
> >
>
>  Thats purely a synthetic, localhost, benchmark using apache bench (ab).
> Hence the explicit 'synthetic' (since it doesn't deal with the 'real world'
> of the internet), but it still gives something of a measuring point to
> compare different implementations performance with :)

Yes, I know it's a purely synthetic number with limited real world
relevance, but I'm still curious about what you're counting as a page.
=)  Is it requests/second, or does each page in your metric include
several requests?

Re: Need some advice - features, inline or external?

Posted by Chris Chabot <ch...@xs4all.nl>.

On May 10, 2008, at 1:17 AM, Brian Eaton wrote:

> (Warning: not a PHP user, likely to be speaking nonsense.)  It sounds
> like something you're doing is screwing up the kernel's I/O cache.

Well the kernel file system cache is what makes it survivable in the  
first place, without putting the entire features structure on disk and  
reading it on on each page request, it is about half the speed as with.

Luckily file_get_contents() (the operation used to read the cache  
file) already uses memory mapping techniques, so the reading of the  
information should be relatively optimal.

I got a feeling it's mostly PHP being a bit stressed about having to  
absorb so much information and parse the serialized information  
(serialize(<some object>) creases a json like structure with class  
names and values) and ramming it into its internal virtual machine so  
many times a second. I'd currently guess that that is where a the  
bottleneck is.

> What you *really* want is for each process to read from a file and
> stream to a socket.  The kernel will do a good job of figuring out
> that those files ought to be kept in memory, so the individual
> processes don't pay for any disk I/O as they work.  Because the files
> are being streamed in chunks it doesn't take much memory per-process
> either.

As i was going to type that readfile() did that and i'll give that a  
shot and see if that makes a difference, when i just saw Kevin's reply  
coming in saying the same, so i won't repeat that :)

Optimally i would probably prefer to have a separate (multithreaded or  
socket select()'ing and chunking) daemon for this that just had all  
the features stuff in memory and go from there, but that would kind of  
defeat the point of having a 'simple' php implementation :)

> It is not uncommon for the kernel to do a better job of
> caching data from disk than an HTTP server can do.

Something i often use religiously (including in shindig, all it's  
default caching is based on this very principle) , strangely enough a  
bunch of kernel developers writing an optimized algorithm in C do a  
bit of a better job then something hacked together in PHP :-)

> About the 420 pages/sec vs 630 pages/sec number... what are you
> counting as a page?

Thats purely a synthetic, localhost, benchmark using apache bench  
(ab). Hence the explicit 'synthetic' (since it doesn't deal with the  
'real world' of the internet), but it still gives something of a  
measuring point to compare different implementations performance with :)

	-- Chris

Re: Need some advice - features, inline or external?

Posted by Kevin Brown <et...@google.com>.

On Fri, May 9, 2008 at 4:17 PM, Brian Eaton <be...@google.com> wrote:

> On Fri, May 9, 2008 at 2:34 PM, Chris Chabot <ch...@xs4all.nl> wrote:
> > The main problem is the overhead of loading all the features javascript
> > each request, this consumes tons of memory and takes loads of IO,
> > so the overly obvious solution is to not do this anymore :)
> <snip>
> >  I'll play a bit with how the cache is setup, and if loading the
> javascript
> > of disk, while caching the features xml and dependency graph is more
> > efficient in the end, or see if i can't think of some other solutions..
> >
> >  If some other ideas bubble up for anyone, please let me know :-)
>
> (Warning: not a PHP user, likely to be speaking nonsense.)  It sounds
> like something you're doing is screwing up the kernel's I/O cache.
> What you *really* want is for each process to read from a file and
> stream to a socket.  The kernel will do a good job of figuring out
> that those files ought to be kept in memory, so the individual
> processes don't pay for any disk I/O as they work.  Because the files
> are being streamed in chunks it doesn't take much memory per-process
> either.
>
> Even better would be to have access to something like the sendfile
> syscall, so you can skip user space entirely.  Again, to make that
> possible you'd need to write the file to disk to take advantage of the
> kernel.  It is not uncommon for the kernel to do a better job of
> caching data from disk than an HTTP server can do.


FYI, in PHP you can do this with http://us2.php.net/readfile. This reads the
data directly from file and pipes it into the output stream.


>
>
> About the 420 pages/sec vs 630 pages/sec number... what are you
> counting as a page?  Does each page load include the requests to
> download the extra js, or are you assuming you get those for free?
>

Re: Need some advice - features, inline or external?

Posted by Brian Eaton <be...@google.com>.

On Fri, May 9, 2008 at 2:34 PM, Chris Chabot <ch...@xs4all.nl> wrote:
> The main problem is the overhead of loading all the features javascript
> each request, this consumes tons of memory and takes loads of IO,
> so the overly obvious solution is to not do this anymore :)
<snip>
>  I'll play a bit with how the cache is setup, and if loading the javascript
> of disk, while caching the features xml and dependency graph is more
> efficient in the end, or see if i can't think of some other solutions..
>
>  If some other ideas bubble up for anyone, please let me know :-)

(Warning: not a PHP user, likely to be speaking nonsense.)  It sounds
like something you're doing is screwing up the kernel's I/O cache.
What you *really* want is for each process to read from a file and
stream to a socket.  The kernel will do a good job of figuring out
that those files ought to be kept in memory, so the individual
processes don't pay for any disk I/O as they work.  Because the files
are being streamed in chunks it doesn't take much memory per-process
either.

Even better would be to have access to something like the sendfile
syscall, so you can skip user space entirely.  Again, to make that
possible you'd need to write the file to disk to take advantage of the
kernel.  It is not uncommon for the kernel to do a better job of
caching data from disk than an HTTP server can do.

About the 420 pages/sec vs 630 pages/sec number... what are you
counting as a page?  Does each page load include the requests to
download the extra js, or are you assuming you get those for free?

Re: Need some advice - features, inline or external?

Posted by Chris Chabot <ch...@xs4all.nl>.

The single request vs multiple requests for features is purely a  
matter of style, both are very easy to do.

In my quick test case with a couple of gadgets all their feature  
combinations were all pretty unique, so in the end it was actually  
less transfers when separating the features then when combining them.  
Right now it's purely a guess which would end up being more efficient,  
but on the other hand it might well be a moot point too since as you  
pointed out, i also think most people prefer the not having the extra  
onload related latency.

Ropu pointed out in chat that it could be an idea to put all scripts  
together in one request, and put a gadgets.util.runOnLoadHandlers();  
at the bottom of it, but i think browsers are unreliable in which  
order they will download external resources, and what part of the  
document will and will not be available at time of it executing (if  
that javascript loaded from the browsers disk cache, that might bug  
out too); So I'm not entirely sure if that would be an option too.

I guess, if i can't find a better solution, making it a configurable  
choice is a viable option. Trade offs are always a shame aren't they,  
and 630 vs 420 pages a second is a pretty damn big difference :-)

I'll play a bit with how the cache is setup, and if loading the  
javascript of disk, while caching the features xml and dependency  
graph is more efficient in the end, or see if i can't think of some  
other solutions..

If some other ideas bubble up for anyone, please let me know :-)

	-- Chris

On May 9, 2008, at 11:17 PM, John Hjelmstad wrote:

> That's true, particularly if you're caching the results of these  
> requests
> (whether in memory or on disk). In practice the issue might not be  
> that
> problematic depending on how many feature-configurations are used by  
> gadgets
> rendered by a given server (though the issue will get more  
> pronounced as
> time goes on). Another option is to learn what feature combinations  
> are most
> popular and create batched JS files for them. So f1:f2:f3:f4.js  
> could turn
> into f1:f3:f4.js and f2.js if the former combination occurs often.
>
> Which approach is best will depend on server characteristics  
> (memory, disk,
> I/O binding), actual gadget feature usage, and browser  
> characteristics, most
> likely.
>
> John
>
> On Fri, May 9, 2008 at 2:02 PM, Ropu <ro...@gmail.com> wrote:
>
>> I though about the last approach and reduce to only one extra  
>> request.
>>
>> But to have all the possible combinations of features in different  
>> JS files
>> would create a LOT of JS files...
>>
>> f1.js, f1f2.js, f1f3.js, f1f2f3,js etc, etc. (there are more than 10
>> features...)
>>
>> perhaps an approach may be to check if the file exists, and if not  
>> create
>> it. This will only add overhead to the first call, and a little  
>> more to
>> check if the file exists in each call.
>>
>> ropu
>>
>> On Fri, May 9, 2008 at 5:49 PM, John Hjelmstad <fa...@google.com>  
>> wrote:
>>
>>> I'm not sure cajoling would be precluded by external <script src>
>> includes.
>>> Once Caja's ready, in any case gadget features will have to be  
>>> carefully
>>> protected - or, features themselves might be cajoled, which if  
>>> done in a
>>> sensible way will allow us to specify symbols which Caja would  
>>> safely
>>> export, allowing binding in the gadget space itself. In either case,
>> again
>>> it seems Caja's not mature enough for us to define this binding  
>>> strategy
>>> yet, so punting is a reasonable option as you say :)
>>>
>>> Client-side perceived latency, your second listed risk, is indeed  
>>> the
>>> likely
>>> bigger deal. I'd wager that several sites would be willing to take  
>>> the
>>> server-cost hit in exchange for better client-side performance. To  
>>> what
>>> extent do you think you could make the approach you use  
>>> configurable?
>>>
>>> Lastly, why need this approach require one request per feature? It  
>>> could
>> be
>>> more efficient to bundle all features together in a single JS  
>>> request as
>>> the
>>> Java JS servlet supports eg.
>>> <script
>> src="...feature1-version1:feature2-version2:feature3-version3.js"/>
>>>
>>> Granted efficiency of this approach depends on features requested  
>>> by all
>>> gadgets on a given page, to facilitate maximum script-sharing, and  
>>> those
>>> optimizations are the sort of thing we could add sometime in the  
>>> future
>>> atop
>>> the rendering and RPC calls.
>>>
>>> --John
>>>
>>> On Fri, May 9, 2008 at 1:31 PM, Chris Chabot <ch...@xs4all.nl>  
>>> wrote:
>>>
>>>> Hey guys, i could do with some advice.
>>>>
>>>> == the problem ==
>>>>
>>>> In the java version, the features are all parsed and their  
>>>> javascript
>>>> content is loaded into memory. This works on the java side, and  
>>>> gives
>> the
>>>> opportunity to cajol the entire content in one fell swoop, so that
>> works
>>>> great.
>>>>
>>>> Now on the PHP side of things things are a bit difference since PHP
>> works
>>>> in a process-per-request type situation, so parsing the entire  
>>>> features
>>>> structure each request is non-doable, would make any semi decent
>>> performance
>>>> impossible to achieve, so instead i process the features once and  
>>>> cache
>>> the
>>>> entire resulting structure in cache, thats about twice as fast as
>>> processing
>>>> the features structure each request, so it's survivable.
>>>>
>>>> Survivable but far from optimal, it's a lot of information to  
>>>> read from
>>>> cache, and a lot of memory consumed (since every process has it's  
>>>> own
>>>> instance, it adds up), so that puts a good bit of pressure on the
>>> server's
>>>> IO.
>>>>
>>>> To add some measurability to this, on a quad core @ 3ghz  
>>>> workstation
>> this
>>>> gets me about 420 pages a second with apache bench.
>>>>
>>>> == the solution ? ==
>>>>
>>>> The main problem is the overhead of loading all the features  
>>>> javascript
>>>> each request, this consumes tons of memory and takes loads of IO,  
>>>> so
>> the
>>>> overly obvious solution is to not do this anymore :)
>>>>
>>>> So what would work is that i make all javascript external (<script
>>>> src="...">), generate script tags for each feature (and it's
>>> dependencies)
>>>> and modify the javascript handler (/gadgets/js) to only output the
>>>> javascript for the requested feature.
>>>>
>>>> There are a few possible downsides i can identify though:
>>>>
>>>> More requests, one per feature, however with an expiration data  
>>>> in the
>>> far
>>>> far future and a cache busting version param, this should be
>> negligible..
>>>> besides the amount of bandwidth used would go down tremendously  
>>>> (a few
>>> small
>>>> kb for a gadget instead of 180kb or so because of all the inline
>>>> javascript), so the combined browser side caching of savings on
>> bandwidth
>>> /
>>>> time to transfer all this info ... should actually have a positive
>> effect
>>>> right?
>>>>
>>>> The second risk is that it could add some perceived latency since  
>>>> the
>>>> gadgets.config.init() and the onLoad handlers can't be called  
>>>> until the
>>>> document has completed, which includes handling the javascript  
>>>> files,
>> and
>>>> whatever external resources the gadget includes..  this is  
>>>> probably the
>>>> biggest problem of this solution.
>>>>
>>>> And finally, it would make cajoling impossible probably ... but  
>>>> that
>>>> doesn't concern me so much right now since we don't have a  
>>>> mechanism
>> for
>>> php
>>>> shindig to do that anyhow :)
>>>>
>>>> With that 'small' modification, the pages/second shoots up to  
>>>> 630, a
>> very
>>>> significant increased, and that's with just a few hacks and not a
>> proper
>>>> implementation of this option.
>>>>
>>>> So the performance gains seem significant enough to consider this,
>> both
>>> in
>>>> server load, pages/second and bandwidth saved.. however as  
>>>> mentioned,
>>>> there's a few risks involved too.
>>>>
>>>> What do you all reckon would be the right solution here? Hope i  
>>>> can get
>>>> your opinions on what would be the better choice here since I'm a  
>>>> bit
>>> torn
>>>> between the two.
>>>>
>>>>       -- Chris
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> .-. --- .--. ..-
>> R o p u
>>

Re: Need some advice - features, inline or external?

Posted by John Hjelmstad <fa...@google.com>.

That's true, particularly if you're caching the results of these requests
(whether in memory or on disk). In practice the issue might not be that
problematic depending on how many feature-configurations are used by gadgets
rendered by a given server (though the issue will get more pronounced as
time goes on). Another option is to learn what feature combinations are most
popular and create batched JS files for them. So f1:f2:f3:f4.js could turn
into f1:f3:f4.js and f2.js if the former combination occurs often.

Which approach is best will depend on server characteristics (memory, disk,
I/O binding), actual gadget feature usage, and browser characteristics, most
likely.

John

On Fri, May 9, 2008 at 2:02 PM, Ropu <ro...@gmail.com> wrote:

> I though about the last approach and reduce to only one extra request.
>
> But to have all the possible combinations of features in different JS files
> would create a LOT of JS files...
>
> f1.js, f1f2.js, f1f3.js, f1f2f3,js etc, etc. (there are more than 10
> features...)
>
> perhaps an approach may be to check if the file exists, and if not create
> it. This will only add overhead to the first call, and a little more to
> check if the file exists in each call.
>
> ropu
>
> On Fri, May 9, 2008 at 5:49 PM, John Hjelmstad <fa...@google.com> wrote:
>
> > I'm not sure cajoling would be precluded by external <script src>
> includes.
> > Once Caja's ready, in any case gadget features will have to be carefully
> > protected - or, features themselves might be cajoled, which if done in a
> > sensible way will allow us to specify symbols which Caja would safely
> > export, allowing binding in the gadget space itself. In either case,
> again
> > it seems Caja's not mature enough for us to define this binding strategy
> > yet, so punting is a reasonable option as you say :)
> >
> > Client-side perceived latency, your second listed risk, is indeed the
> > likely
> > bigger deal. I'd wager that several sites would be willing to take the
> > server-cost hit in exchange for better client-side performance. To what
> > extent do you think you could make the approach you use configurable?
> >
> > Lastly, why need this approach require one request per feature? It could
> be
> > more efficient to bundle all features together in a single JS request as
> > the
> > Java JS servlet supports eg.
> > <script
> src="...feature1-version1:feature2-version2:feature3-version3.js"/>
> >
> > Granted efficiency of this approach depends on features requested by all
> > gadgets on a given page, to facilitate maximum script-sharing, and those
> > optimizations are the sort of thing we could add sometime in the future
> > atop
> > the rendering and RPC calls.
> >
> > --John
> >
> > On Fri, May 9, 2008 at 1:31 PM, Chris Chabot <ch...@xs4all.nl> wrote:
> >
> > > Hey guys, i could do with some advice.
> > >
> > > == the problem ==
> > >
> > > In the java version, the features are all parsed and their javascript
> > > content is loaded into memory. This works on the java side, and gives
> the
> > > opportunity to cajol the entire content in one fell swoop, so that
> works
> > > great.
> > >
> > > Now on the PHP side of things things are a bit difference since PHP
> works
> > > in a process-per-request type situation, so parsing the entire features
> > > structure each request is non-doable, would make any semi decent
> > performance
> > > impossible to achieve, so instead i process the features once and cache
> > the
> > > entire resulting structure in cache, thats about twice as fast as
> > processing
> > > the features structure each request, so it's survivable.
> > >
> > > Survivable but far from optimal, it's a lot of information to read from
> > > cache, and a lot of memory consumed (since every process has it's own
> > > instance, it adds up), so that puts a good bit of pressure on the
> > server's
> > > IO.
> > >
> > > To add some measurability to this, on a quad core @ 3ghz workstation
> this
> > > gets me about 420 pages a second with apache bench.
> > >
> > > == the solution ? ==
> > >
> > > The main problem is the overhead of loading all the features javascript
> > > each request, this consumes tons of memory and takes loads of IO, so
> the
> > > overly obvious solution is to not do this anymore :)
> > >
> > > So what would work is that i make all javascript external (<script
> > > src="...">), generate script tags for each feature (and it's
> > dependencies)
> > > and modify the javascript handler (/gadgets/js) to only output the
> > > javascript for the requested feature.
> > >
> > > There are a few possible downsides i can identify though:
> > >
> > > More requests, one per feature, however with an expiration data in the
> > far
> > > far future and a cache busting version param, this should be
> negligible..
> > > besides the amount of bandwidth used would go down tremendously (a few
> > small
> > > kb for a gadget instead of 180kb or so because of all the inline
> > > javascript), so the combined browser side caching of savings on
> bandwidth
> > /
> > > time to transfer all this info ... should actually have a positive
> effect
> > > right?
> > >
> > > The second risk is that it could add some perceived latency since the
> > > gadgets.config.init() and the onLoad handlers can't be called until the
> > > document has completed, which includes handling the javascript files,
> and
> > > whatever external resources the gadget includes..  this is probably the
> > > biggest problem of this solution.
> > >
> > > And finally, it would make cajoling impossible probably ... but that
> > > doesn't concern me so much right now since we don't have a mechanism
> for
> > php
> > > shindig to do that anyhow :)
> > >
> > > With that 'small' modification, the pages/second shoots up to 630, a
> very
> > > significant increased, and that's with just a few hacks and not a
> proper
> > > implementation of this option.
> > >
> > > So the performance gains seem significant enough to consider this,
>  both
> > in
> > > server load, pages/second and bandwidth saved.. however as mentioned,
> > > there's a few risks involved too.
> > >
> > > What do you all reckon would be the right solution here? Hope i can get
> > > your opinions on what would be the better choice here since I'm a bit
> > torn
> > > between the two.
> > >
> > >        -- Chris
> > >
> > >
> >
>
>
>
> --
> .-. --- .--. ..-
> R o p u
>

Re: Need some advice - features, inline or external?

Posted by Ropu <ro...@gmail.com>.

I though about the last approach and reduce to only one extra request.

But to have all the possible combinations of features in different JS files
would create a LOT of JS files...

f1.js, f1f2.js, f1f3.js, f1f2f3,js etc, etc. (there are more than 10
features...)

perhaps an approach may be to check if the file exists, and if not create
it. This will only add overhead to the first call, and a little more to
check if the file exists in each call.

ropu

On Fri, May 9, 2008 at 5:49 PM, John Hjelmstad <fa...@google.com> wrote:

> I'm not sure cajoling would be precluded by external <script src> includes.
> Once Caja's ready, in any case gadget features will have to be carefully
> protected - or, features themselves might be cajoled, which if done in a
> sensible way will allow us to specify symbols which Caja would safely
> export, allowing binding in the gadget space itself. In either case, again
> it seems Caja's not mature enough for us to define this binding strategy
> yet, so punting is a reasonable option as you say :)
>
> Client-side perceived latency, your second listed risk, is indeed the
> likely
> bigger deal. I'd wager that several sites would be willing to take the
> server-cost hit in exchange for better client-side performance. To what
> extent do you think you could make the approach you use configurable?
>
> Lastly, why need this approach require one request per feature? It could be
> more efficient to bundle all features together in a single JS request as
> the
> Java JS servlet supports eg.
> <script src="...feature1-version1:feature2-version2:feature3-version3.js"/>
>
> Granted efficiency of this approach depends on features requested by all
> gadgets on a given page, to facilitate maximum script-sharing, and those
> optimizations are the sort of thing we could add sometime in the future
> atop
> the rendering and RPC calls.
>
> --John
>
> On Fri, May 9, 2008 at 1:31 PM, Chris Chabot <ch...@xs4all.nl> wrote:
>
> > Hey guys, i could do with some advice.
> >
> > == the problem ==
> >
> > In the java version, the features are all parsed and their javascript
> > content is loaded into memory. This works on the java side, and gives the
> > opportunity to cajol the entire content in one fell swoop, so that works
> > great.
> >
> > Now on the PHP side of things things are a bit difference since PHP works
> > in a process-per-request type situation, so parsing the entire features
> > structure each request is non-doable, would make any semi decent
> performance
> > impossible to achieve, so instead i process the features once and cache
> the
> > entire resulting structure in cache, thats about twice as fast as
> processing
> > the features structure each request, so it's survivable.
> >
> > Survivable but far from optimal, it's a lot of information to read from
> > cache, and a lot of memory consumed (since every process has it's own
> > instance, it adds up), so that puts a good bit of pressure on the
> server's
> > IO.
> >
> > To add some measurability to this, on a quad core @ 3ghz workstation this
> > gets me about 420 pages a second with apache bench.
> >
> > == the solution ? ==
> >
> > The main problem is the overhead of loading all the features javascript
> > each request, this consumes tons of memory and takes loads of IO, so the
> > overly obvious solution is to not do this anymore :)
> >
> > So what would work is that i make all javascript external (<script
> > src="...">), generate script tags for each feature (and it's
> dependencies)
> > and modify the javascript handler (/gadgets/js) to only output the
> > javascript for the requested feature.
> >
> > There are a few possible downsides i can identify though:
> >
> > More requests, one per feature, however with an expiration data in the
> far
> > far future and a cache busting version param, this should be negligible..
> > besides the amount of bandwidth used would go down tremendously (a few
> small
> > kb for a gadget instead of 180kb or so because of all the inline
> > javascript), so the combined browser side caching of savings on bandwidth
> /
> > time to transfer all this info ... should actually have a positive effect
> > right?
> >
> > The second risk is that it could add some perceived latency since the
> > gadgets.config.init() and the onLoad handlers can't be called until the
> > document has completed, which includes handling the javascript files, and
> > whatever external resources the gadget includes..  this is probably the
> > biggest problem of this solution.
> >
> > And finally, it would make cajoling impossible probably ... but that
> > doesn't concern me so much right now since we don't have a mechanism for
> php
> > shindig to do that anyhow :)
> >
> > With that 'small' modification, the pages/second shoots up to 630, a very
> > significant increased, and that's with just a few hacks and not a proper
> > implementation of this option.
> >
> > So the performance gains seem significant enough to consider this,  both
> in
> > server load, pages/second and bandwidth saved.. however as mentioned,
> > there's a few risks involved too.
> >
> > What do you all reckon would be the right solution here? Hope i can get
> > your opinions on what would be the better choice here since I'm a bit
> torn
> > between the two.
> >
> >        -- Chris
> >
> >
>



-- 
.-. --- .--. ..-
R o p u

Re: Need some advice - features, inline or external?

Posted by John Hjelmstad <fa...@google.com>.

I'm not sure cajoling would be precluded by external <script src> includes.
Once Caja's ready, in any case gadget features will have to be carefully
protected - or, features themselves might be cajoled, which if done in a
sensible way will allow us to specify symbols which Caja would safely
export, allowing binding in the gadget space itself. In either case, again
it seems Caja's not mature enough for us to define this binding strategy
yet, so punting is a reasonable option as you say :)

Client-side perceived latency, your second listed risk, is indeed the likely
bigger deal. I'd wager that several sites would be willing to take the
server-cost hit in exchange for better client-side performance. To what
extent do you think you could make the approach you use configurable?

Lastly, why need this approach require one request per feature? It could be
more efficient to bundle all features together in a single JS request as the
Java JS servlet supports eg.
<script src="...feature1-version1:feature2-version2:feature3-version3.js"/>

Granted efficiency of this approach depends on features requested by all
gadgets on a given page, to facilitate maximum script-sharing, and those
optimizations are the sort of thing we could add sometime in the future atop
the rendering and RPC calls.

--John

On Fri, May 9, 2008 at 1:31 PM, Chris Chabot <ch...@xs4all.nl> wrote:

> Hey guys, i could do with some advice.
>
> == the problem ==
>
> In the java version, the features are all parsed and their javascript
> content is loaded into memory. This works on the java side, and gives the
> opportunity to cajol the entire content in one fell swoop, so that works
> great.
>
> Now on the PHP side of things things are a bit difference since PHP works
> in a process-per-request type situation, so parsing the entire features
> structure each request is non-doable, would make any semi decent performance
> impossible to achieve, so instead i process the features once and cache the
> entire resulting structure in cache, thats about twice as fast as processing
> the features structure each request, so it's survivable.
>
> Survivable but far from optimal, it's a lot of information to read from
> cache, and a lot of memory consumed (since every process has it's own
> instance, it adds up), so that puts a good bit of pressure on the server's
> IO.
>
> To add some measurability to this, on a quad core @ 3ghz workstation this
> gets me about 420 pages a second with apache bench.
>
> == the solution ? ==
>
> The main problem is the overhead of loading all the features javascript
> each request, this consumes tons of memory and takes loads of IO, so the
> overly obvious solution is to not do this anymore :)
>
> So what would work is that i make all javascript external (<script
> src="...">), generate script tags for each feature (and it's dependencies)
> and modify the javascript handler (/gadgets/js) to only output the
> javascript for the requested feature.
>
> There are a few possible downsides i can identify though:
>
> More requests, one per feature, however with an expiration data in the far
> far future and a cache busting version param, this should be negligible..
> besides the amount of bandwidth used would go down tremendously (a few small
> kb for a gadget instead of 180kb or so because of all the inline
> javascript), so the combined browser side caching of savings on bandwidth /
> time to transfer all this info ... should actually have a positive effect
> right?
>
> The second risk is that it could add some perceived latency since the
> gadgets.config.init() and the onLoad handlers can't be called until the
> document has completed, which includes handling the javascript files, and
> whatever external resources the gadget includes..  this is probably the
> biggest problem of this solution.
>
> And finally, it would make cajoling impossible probably ... but that
> doesn't concern me so much right now since we don't have a mechanism for php
> shindig to do that anyhow :)
>
> With that 'small' modification, the pages/second shoots up to 630, a very
> significant increased, and that's with just a few hacks and not a proper
> implementation of this option.
>
> So the performance gains seem significant enough to consider this,  both in
> server load, pages/second and bandwidth saved.. however as mentioned,
> there's a few risks involved too.
>
> What do you all reckon would be the right solution here? Hope i can get
> your opinions on what would be the better choice here since I'm a bit torn
> between the two.
>
>        -- Chris
>
>

Re: Need some advice - features, inline or external?

Posted by Chris Chabot <ch...@xs4all.nl>.

Thanks thats exactly the info i needed :)

On May 10, 2008, at 1:59 AM, Kevin Brown wrote:

>
> That's how we wound up doing it for the java implementation. The  
> impact
> isn't significant enough to worry about it, and on firefox and  
> safari it
> actually seems to be faster.
>

Re: Need some advice - features, inline or external?

Posted by Kevin Brown <et...@google.com>.

On Fri, May 9, 2008 at 4:56 PM, Chris Chabot <ch...@xs4all.nl> wrote:

>
> On May 10, 2008, at 1:29 AM, Kevin Brown wrote:
>
>
>> You know this is exactly what the java server does currently when you pass
>> the libs parameter, right? The idea is that the container can aggregate
>> what
>> they expect the most commonly used features to be, and those are handled
>> by
>> /gadgets/js. Everything else is inlined. For example, on a production
>> server:
>>
>
> php shindig has the exact same ting, and was in fact what i was using with
> most of my experimentation (with some slight alterations to test out having
> separate script tags per feature vs one big blob)
>
> However that didn't give me any idea how to bypass the onLoad latency that
> that would cause.
>
> if i would just do a libs=opensocial-0.7:settitle:etc, then the
> onLoadHandlers() is called before the gadget.* calls become available, and
> the whole thing errors out :)
>
> The way around this was to run the runOnLoadHandlers() on the actual page
> load, with the added downside that this added some perceived latency .. how
> do you deal with that?


That's how we wound up doing it for the java implementation. The impact
isn't significant enough to worry about it, and on firefox and safari it
actually seems to be faster.


>
>        -- Chris
>

Re: Need some advice - features, inline or external?

Posted by Chris Chabot <ch...@xs4all.nl>.

On May 10, 2008, at 1:29 AM, Kevin Brown wrote:

>
> You know this is exactly what the java server does currently when  
> you pass
> the libs parameter, right? The idea is that the container can  
> aggregate what
> they expect the most commonly used features to be, and those are  
> handled by
> /gadgets/js. Everything else is inlined. For example, on a production
> server:

php shindig has the exact same ting, and was in fact what i was using  
with most of my experimentation (with some slight alterations to test  
out having separate script tags per feature vs one big blob)

However that didn't give me any idea how to bypass the onLoad latency  
that that would cause.

if i would just do a libs=opensocial-0.7:settitle:etc, then the  
onLoadHandlers() is called before the gadget.* calls become available,  
and the whole thing errors out :)

The way around this was to run the runOnLoadHandlers() on the actual  
page load, with the added downside that this added some perceived  
latency .. how do you deal with that?

	-- Chris

Re: Need some advice - features, inline or external?

Posted by Kevin Brown <et...@google.com>.

On Fri, May 9, 2008 at 1:31 PM, Chris Chabot <ch...@xs4all.nl> wrote:

> Hey guys, i could do with some advice.
>
> == the problem ==
>
> In the java version, the features are all parsed and their javascript
> content is loaded into memory. This works on the java side, and gives the
> opportunity to cajol the entire content in one fell swoop, so that works
> great.
>
> Now on the PHP side of things things are a bit difference since PHP works
> in a process-per-request type situation, so parsing the entire features
> structure each request is non-doable, would make any semi decent performance
> impossible to achieve, so instead i process the features once and cache the
> entire resulting structure in cache, thats about twice as fast as processing
> the features structure each request, so it's survivable.
>
> Survivable but far from optimal, it's a lot of information to read from
> cache, and a lot of memory consumed (since every process has it's own
> instance, it adds up), so that puts a good bit of pressure on the server's
> IO.
>
> To add some measurability to this, on a quad core @ 3ghz workstation this
> gets me about 420 pages a second with apache bench.
>
> == the solution ? ==
>
> The main problem is the overhead of loading all the features javascript
> each request, this consumes tons of memory and takes loads of IO, so the
> overly obvious solution is to not do this anymore :)
>
> So what would work is that i make all javascript external (<script
> src="...">), generate script tags for each feature (and it's dependencies)
> and modify the javascript handler (/gadgets/js) to only output the
> javascript for the requested feature.


You know this is exactly what the java server does currently when you pass
the libs parameter, right? The idea is that the container can aggregate what
they expect the most commonly used features to be, and those are handled by
/gadgets/js. Everything else is inlined. For example, on a production
server:

http://sandbox.gmodules.com/gadgets/js/settitle:dynamic-height.js

All the iframes on orkut (and other google properties) generate the iframes
by setting the most commonly used libraries for the site. For example, on
orkut, you might see>

<iframe src="
gmodules.com/gadgets/ifr?url...&libs=opensocial-0.7:dynamic-height:setprefs&..
.">

The generated url is then:

http://www.sandbox.gmodules.com/gadgets/js/dynamic-height:opensocial-0.7:settitle.js


>
> There are a few possible downsides i can identify though:
>
> More requests, one per feature, however with an expiration data in the far
> far future and a cache busting version param, this should be negligible..
> besides the amount of bandwidth used would go down tremendously (a few small
> kb for a gadget instead of 180kb or so because of all the inline
> javascript), so the combined browser side caching of savings on bandwidth /
> time to transfer all this info ... should actually have a positive effect
> right?
>
> The second risk is that it could add some perceived latency since the
> gadgets.config.init() and the onLoad handlers can't be called until the
> document has completed, which includes handling the javascript files, and
> whatever external resources the gadget includes..  this is probably the
> biggest problem of this solution.
>
> And finally, it would make cajoling impossible probably ... but that
> doesn't concern me so much right now since we don't have a mechanism for php
> shindig to do that anyhow :)
>
> With that 'small' modification, the pages/second shoots up to 630, a very
> significant increased, and that's with just a few hacks and not a proper
> implementation of this option.
>
> So the performance gains seem significant enough to consider this,  both in
> server load, pages/second and bandwidth saved.. however as mentioned,
> there's a few risks involved too.
>
> What do you all reckon would be the right solution here? Hope i can get
> your opinions on what would be the better choice here since I'm a bit torn
> between the two.
>
>        -- Chris
>
>