You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@roller.apache.org by "(David) Ming Xia" <da...@ibol.biz> on 2010/05/21 18:09:15 UTC

Roller's implementation on conditional Get

Hi, Everyone.

   This is about the implementation of conditional Get in Roller 4.0.1.

   As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’ attribute on server side.  And then either responds with a fresh page with status code 200, or responds with a status code 304.     

  What I feel concerned is the part retrieving ‘Last-Modified’.  It is implemented in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you can see the sequence diagram, which depicts the related class.  Every time a weblog entry is added or changed, the  ‘last-modified’ field of corresponding website table will be updated.  For any http request, PageServlet has to go through a JPA named query to get the ‘last-modified’ value.  That value is not cached in memory, and it is not kind of way that the entities float across context (any how...).  So as far as I can see, it is hard query.  

   But for one page query, there are usually at least ten http query, including query for text/html file, css file, js file, images, and so on.  So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database queries.  Furthermore, for any serious production environment, database and application server are on different tiers and the connection is encrypted with SSL.  So the picture to me it that, for limited concurrent users it is fine, but when request volume goes up, the server may suddenly chocked up.
 
   I would appreciate if some one could respond and explain this, or provide some good advices 
 
 
Thank you very much.
 
 
David

Ehcache on conditional Get

Posted by "(David) Ming Xia" <da...@ibol.biz>.

I learned that Ehcache supports 'conditional Get', but I haven't got a chance to look into it.  I would truly appreciate if some one could shed some light on this topic, such as some explanation, sample code, URL links, some thoughts or some hints.    
 
Thank you very much
 
 
David
 
--- On Mon, 5/24/10, (David) Ming Xia <da...@ibol.biz> wrote:


From: (David) Ming Xia <da...@ibol.biz>
Subject: Re: Roller's implementation on conditional Get
To: "John G. Moylan" <jo...@nuatech.net>
Cc: "Mailing List Apache Roller User" <us...@roller.apache.org>, "Mailing List Apache Roller Developer" <de...@roller.apache.org>
Date: Monday, May 24, 2010, 11:48 AM


Thank you John for your response.
 
   Roller's users frequently add new entries and update existing entries.  The trick is, every time an entry added or updated, the parent weblog’s last-modified time will be updated with current time, and this change is updated to the website table. 
 
   Roller caches web content for each requested page.  For each web request Roller queries website table for the value of last-modified-time, and compare it against if-modified-since in the http request header to evaluate the freshness of the cache.  So it seems that the Roller website table is a point that we could not get around for the current design.  This can be resolved only if Roller updates a time-out cache with last-modified-time each time an entry added or updated, and the time-out cache, instead of the database table is checked for each web page request.   
 
    Also, I would suggest that Roller only supports ‘conditional Get’ for text/html content.  I would suggest Roller has a separate web component to hold all css, js and image files, and that web component does not support ‘conditional Get’.  

  Talking about cache, it seems Roller is designed to use Ehcache (I only see jar and configuration file, I did not see any corresponding api call.  Hmm…).   I don’t know very much about memcached.  Could you give some comparison of memcached and ehcache?  

 
Thanks.
 
David     


--- On Mon, 5/24/10, John G. Moylan <jo...@nuatech.net> wrote:


From: John G. Moylan <jo...@nuatech.net>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Monday, May 24, 2010, 7:59 AM



Hi David,


If you are concerned with performance then you should use memcached to cache JPA lookups. You can also set explicit cache expires on your files. The last-modified issue you have specified above is the same on most dynamic systems where last-modified support based on time or etag is used.


J




On 21 May 2010 17:09, (David) Ming Xia <da...@ibol.biz> wrote:






Hi, Everyone.

   This is about the implementation of conditional Get in Roller 4.0.1.

   As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’ attribute on server side.  And then either responds with a fresh page with status code 200, or responds with a status code 304.     

  What I feel concerned is the part retrieving ‘Last-Modified’.  It is implemented in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you can see the sequence diagram, which depicts the related class.  Every time a weblog entry is added or changed, the  ‘last-modified’ field of corresponding website table will be updated.  For any http request, PageServlet has to go through a JPA named query to get the ‘last-modified’ value.  That value is not cached in memory, and it is not kind of way that the entities float across context (any how...).  So as far as I can see, it is hard query.  

   But for one page query, there are usually at least ten http query, including query for text/html file, css file, js file, images, and so on.  So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database queries.  Furthermore, for any serious production environment, database and application server are on different tiers and the connection is encrypted with SSL.  So the picture to me it that, for limited concurrent users it is fine, but when request volume goes up, the server may suddenly chocked up.
 
   I would appreciate if some one could respond and explain this, or provide some good advices 
 
 
Thank you very much.
 
 
David



-- 
_____________
John G. Moylan

Re: Roller's implementation on conditional Get

Posted by "(David) Ming Xia" <da...@ibol.biz>.

Sorry, I should say that Roller check 'last-modified-time' to check the freshness of web browser cache.  Cache freshness is maintained in a different process.

-David

--- On Mon, 5/24/10, (David) Ming Xia <da...@ibol.biz> wrote:

From: (David) Ming Xia <da...@ibol.biz>
Subject: Re: Roller's implementation on conditional Get
To: "John G. Moylan" <jo...@nuatech.net>
Cc: "Mailing List Apache Roller User" <us...@roller.apache.org>, "Mailing List Apache Roller Developer" <de...@roller.apache.org>
Date: Monday, May 24, 2010, 11:48 AM

Thank you John for your response.

   Roller's users frequently add new entries and update existing entries.  The trick is, every time an entry added or updated, the parent weblog’s last-modified time will be updated with current time, and this change is updated to the website table. 

   Roller caches web content for each requested page.  For each web request Roller queries website table for the value of last-modified-time, and compare it against if-modified-since in the http request header to evaluate the freshness of the cache.  So it seems that the Roller website table is a point that we could not get around for the current design.  This can be resolved only if Roller updates a time-out cache with last-modified-time each time an entry added or updated, and the time-out cache, instead of the database table is checked for each web page request.   

    Also, I would suggest that Roller only supports ‘conditional Get’ for text/html content.  I would suggest Roller has a separate web component to hold all css, js and image files, and that web component does not support ‘conditional Get’.  

  Talking about cache, it seems Roller is designed to use Ehcache (I only see jar and configuration file, I did not see any corresponding api call.  Hmm…).   I don’t know very much about memcached.  Could you give some comparison of memcached and ehcache?  

Thanks.

David     

--- On Mon, 5/24/10, John G. Moylan <jo...@nuatech.net> wrote:

From: John G. Moylan <jo...@nuatech.net>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Monday, May 24, 2010, 7:59 AM

Hi David,

If you are concerned with performance then you should use memcached to cache JPA lookups. You can also set explicit cache expires on your files. The last-modified issue you have specified above is the same on most dynamic systems where last-modified support based on time or etag is used.

J

On 21 May 2010 17:09, (David) Ming Xia <da...@ibol.biz> wrote:

Hi, Everyone.

   This is about the implementation of conditional Get in Roller 4.0.1.

   As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’ attribute on server side.  And then either responds with a fresh page with status code 200, or responds with a status code 304.     

  What I feel concerned is the part retrieving ‘Last-Modified’.  It is implemented in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you can see the sequence diagram, which depicts the related class.  Every time a weblog entry is added or changed, the  ‘last-modified’ field of corresponding website table will be updated.  For any http request, PageServlet has to go through a JPA named query to get the ‘last-modified’ value.  That value is not cached in memory, and it is not kind of way that the entities float across context (any how...).  So as far as I can see, it is hard query.  

   But for one page query, there are usually at least ten http query, including query for text/html file, css file, js file, images, and so on.  So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database queries.  Furthermore, for any serious production environment, database and application server are on different tiers and the connection is encrypted with SSL.  So the picture to me it that, for limited concurrent users it is fine, but when request volume goes up, the server may suddenly chocked up.

   I would appreciate if some one could respond and explain this, or provide some good advices 

Thank you very much.

David

-- 
_____________
John G. Moylan

Re: Roller's implementation on conditional Get

Posted by "(David) Ming Xia" <da...@ibol.biz>.

Sorry, I should say that Roller check 'last-modified-time' to check the freshness of web browser cache.  Cache freshness is maintained in a different process.

-David

--- On Mon, 5/24/10, (David) Ming Xia <da...@ibol.biz> wrote:

From: (David) Ming Xia <da...@ibol.biz>
Subject: Re: Roller's implementation on conditional Get
To: "John G. Moylan" <jo...@nuatech.net>
Cc: "Mailing List Apache Roller User" <us...@roller.apache.org>, "Mailing List Apache Roller Developer" <de...@roller.apache.org>
Date: Monday, May 24, 2010, 11:48 AM

Thank you John for your response.

   Roller's users frequently add new entries and update existing entries.  The trick is, every time an entry added or updated, the parent weblog’s last-modified time will be updated with current time, and this change is updated to the website table. 

   Roller caches web content for each requested page.  For each web request Roller queries website table for the value of last-modified-time, and compare it against if-modified-since in the http request header to evaluate the freshness of the cache.  So it seems that the Roller website table is a point that we could not get around for the current design.  This can be resolved only if Roller updates a time-out cache with last-modified-time each time an entry added or updated, and the time-out cache, instead of the database table is checked for each web page request.   

    Also, I would suggest that Roller only supports ‘conditional Get’ for text/html content.  I would suggest Roller has a separate web component to hold all css, js and image files, and that web component does not support ‘conditional Get’.  

  Talking about cache, it seems Roller is designed to use Ehcache (I only see jar and configuration file, I did not see any corresponding api call.  Hmm…).   I don’t know very much about memcached.  Could you give some comparison of memcached and ehcache?  

Thanks.

David     

--- On Mon, 5/24/10, John G. Moylan <jo...@nuatech.net> wrote:

From: John G. Moylan <jo...@nuatech.net>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Monday, May 24, 2010, 7:59 AM

Hi David,

If you are concerned with performance then you should use memcached to cache JPA lookups. You can also set explicit cache expires on your files. The last-modified issue you have specified above is the same on most dynamic systems where last-modified support based on time or etag is used.

J

On 21 May 2010 17:09, (David) Ming Xia <da...@ibol.biz> wrote:

Hi, Everyone.

   This is about the implementation of conditional Get in Roller 4.0.1.

   As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’ attribute on server side.  And then either responds with a fresh page with status code 200, or responds with a status code 304.     

  What I feel concerned is the part retrieving ‘Last-Modified’.  It is implemented in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you can see the sequence diagram, which depicts the related class.  Every time a weblog entry is added or changed, the  ‘last-modified’ field of corresponding website table will be updated.  For any http request, PageServlet has to go through a JPA named query to get the ‘last-modified’ value.  That value is not cached in memory, and it is not kind of way that the entities float across context (any how...).  So as far as I can see, it is hard query.  

   But for one page query, there are usually at least ten http query, including query for text/html file, css file, js file, images, and so on.  So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database queries.  Furthermore, for any serious production environment, database and application server are on different tiers and the connection is encrypted with SSL.  So the picture to me it that, for limited concurrent users it is fine, but when request volume goes up, the server may suddenly chocked up.

   I would appreciate if some one could respond and explain this, or provide some good advices 

Thank you very much.

David

-- 
_____________
John G. Moylan

Ehcache on conditional Get

Posted by "(David) Ming Xia" <da...@ibol.biz>.

I learned that Ehcache supports 'conditional Get', but I haven't got a chance to look into it.  I would truly appreciate if some one could shed some light on this topic, such as some explanation, sample code, URL links, some thoughts or some hints.    
 
Thank you very much
 
 
David
 
--- On Mon, 5/24/10, (David) Ming Xia <da...@ibol.biz> wrote:


From: (David) Ming Xia <da...@ibol.biz>
Subject: Re: Roller's implementation on conditional Get
To: "John G. Moylan" <jo...@nuatech.net>
Cc: "Mailing List Apache Roller User" <us...@roller.apache.org>, "Mailing List Apache Roller Developer" <de...@roller.apache.org>
Date: Monday, May 24, 2010, 11:48 AM


Thank you John for your response.
 
   Roller's users frequently add new entries and update existing entries.  The trick is, every time an entry added or updated, the parent weblog’s last-modified time will be updated with current time, and this change is updated to the website table. 
 
   Roller caches web content for each requested page.  For each web request Roller queries website table for the value of last-modified-time, and compare it against if-modified-since in the http request header to evaluate the freshness of the cache.  So it seems that the Roller website table is a point that we could not get around for the current design.  This can be resolved only if Roller updates a time-out cache with last-modified-time each time an entry added or updated, and the time-out cache, instead of the database table is checked for each web page request.   
 
    Also, I would suggest that Roller only supports ‘conditional Get’ for text/html content.  I would suggest Roller has a separate web component to hold all css, js and image files, and that web component does not support ‘conditional Get’.  

  Talking about cache, it seems Roller is designed to use Ehcache (I only see jar and configuration file, I did not see any corresponding api call.  Hmm…).   I don’t know very much about memcached.  Could you give some comparison of memcached and ehcache?  

 
Thanks.
 
David     


--- On Mon, 5/24/10, John G. Moylan <jo...@nuatech.net> wrote:


From: John G. Moylan <jo...@nuatech.net>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Monday, May 24, 2010, 7:59 AM



Hi David,


If you are concerned with performance then you should use memcached to cache JPA lookups. You can also set explicit cache expires on your files. The last-modified issue you have specified above is the same on most dynamic systems where last-modified support based on time or etag is used.


J




On 21 May 2010 17:09, (David) Ming Xia <da...@ibol.biz> wrote:






Hi, Everyone.

   This is about the implementation of conditional Get in Roller 4.0.1.

   As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’ attribute on server side.  And then either responds with a fresh page with status code 200, or responds with a status code 304.     

  What I feel concerned is the part retrieving ‘Last-Modified’.  It is implemented in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you can see the sequence diagram, which depicts the related class.  Every time a weblog entry is added or changed, the  ‘last-modified’ field of corresponding website table will be updated.  For any http request, PageServlet has to go through a JPA named query to get the ‘last-modified’ value.  That value is not cached in memory, and it is not kind of way that the entities float across context (any how...).  So as far as I can see, it is hard query.  

   But for one page query, there are usually at least ten http query, including query for text/html file, css file, js file, images, and so on.  So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database queries.  Furthermore, for any serious production environment, database and application server are on different tiers and the connection is encrypted with SSL.  So the picture to me it that, for limited concurrent users it is fine, but when request volume goes up, the server may suddenly chocked up.
 
   I would appreciate if some one could respond and explain this, or provide some good advices 
 
 
Thank you very much.
 
 
David



-- 
_____________
John G. Moylan

Re: Roller's implementation on conditional Get

Posted by "(David) Ming Xia" <da...@ibol.biz>.

Thank you John for your response.
 
   Roller's users frequently add new entries and update existing entries.  The trick is, every time an entry added or updated, the parent weblog’s last-modified time will be updated with current time, and this change is updated to the website table. 
 
   Roller caches web content for each requested page.  For each web request Roller queries website table for the value of last-modified-time, and compare it against if-modified-since in the http request header to evaluate the freshness of the cache.  So it seems that the Roller website table is a point that we could not get around for the current design.  This can be resolved only if Roller updates a time-out cache with last-modified-time each time an entry added or updated, and the time-out cache, instead of the database table is checked for each web page request.   
 
    Also, I would suggest that Roller only supports ‘conditional Get’ for text/html content.  I would suggest Roller has a separate web component to hold all css, js and image files, and that web component does not support ‘conditional Get’.  

  Talking about cache, it seems Roller is designed to use Ehcache (I only see jar and configuration file, I did not see any corresponding api call.  Hmm…).   I don’t know very much about memcached.  Could you give some comparison of memcached and ehcache?  

 
Thanks.
 
David     


--- On Mon, 5/24/10, John G. Moylan <jo...@nuatech.net> wrote:


From: John G. Moylan <jo...@nuatech.net>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Monday, May 24, 2010, 7:59 AM



Hi David,


If you are concerned with performance then you should use memcached to cache JPA lookups. You can also set explicit cache expires on your files. The last-modified issue you have specified above is the same on most dynamic systems where last-modified support based on time or etag is used.


J




On 21 May 2010 17:09, (David) Ming Xia <da...@ibol.biz> wrote:






Hi, Everyone.

   This is about the implementation of conditional Get in Roller 4.0.1.

   As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’ attribute on server side.  And then either responds with a fresh page with status code 200, or responds with a status code 304.     

  What I feel concerned is the part retrieving ‘Last-Modified’.  It is implemented in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you can see the sequence diagram, which depicts the related class.  Every time a weblog entry is added or changed, the  ‘last-modified’ field of corresponding website table will be updated.  For any http request, PageServlet has to go through a JPA named query to get the ‘last-modified’ value.  That value is not cached in memory, and it is not kind of way that the entities float across context (any how...).  So as far as I can see, it is hard query.  

   But for one page query, there are usually at least ten http query, including query for text/html file, css file, js file, images, and so on.  So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database queries.  Furthermore, for any serious production environment, database and application server are on different tiers and the connection is encrypted with SSL.  So the picture to me it that, for limited concurrent users it is fine, but when request volume goes up, the server may suddenly chocked up.
 
   I would appreciate if some one could respond and explain this, or provide some good advices 
 
 
Thank you very much.
 
 
David



-- 
_____________
John G. Moylan

Re: Roller's implementation on conditional Get

Posted by "(David) Ming Xia" <da...@ibol.biz>.

Thank you John for your response.
 
   Roller's users frequently add new entries and update existing entries.  The trick is, every time an entry added or updated, the parent weblog’s last-modified time will be updated with current time, and this change is updated to the website table. 
 
   Roller caches web content for each requested page.  For each web request Roller queries website table for the value of last-modified-time, and compare it against if-modified-since in the http request header to evaluate the freshness of the cache.  So it seems that the Roller website table is a point that we could not get around for the current design.  This can be resolved only if Roller updates a time-out cache with last-modified-time each time an entry added or updated, and the time-out cache, instead of the database table is checked for each web page request.   
 
    Also, I would suggest that Roller only supports ‘conditional Get’ for text/html content.  I would suggest Roller has a separate web component to hold all css, js and image files, and that web component does not support ‘conditional Get’.  

  Talking about cache, it seems Roller is designed to use Ehcache (I only see jar and configuration file, I did not see any corresponding api call.  Hmm…).   I don’t know very much about memcached.  Could you give some comparison of memcached and ehcache?  

 
Thanks.
 
David     


--- On Mon, 5/24/10, John G. Moylan <jo...@nuatech.net> wrote:


From: John G. Moylan <jo...@nuatech.net>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Monday, May 24, 2010, 7:59 AM



Hi David,


If you are concerned with performance then you should use memcached to cache JPA lookups. You can also set explicit cache expires on your files. The last-modified issue you have specified above is the same on most dynamic systems where last-modified support based on time or etag is used.


J




On 21 May 2010 17:09, (David) Ming Xia <da...@ibol.biz> wrote:






Hi, Everyone.

   This is about the implementation of conditional Get in Roller 4.0.1.

   As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’ attribute on server side.  And then either responds with a fresh page with status code 200, or responds with a status code 304.     

  What I feel concerned is the part retrieving ‘Last-Modified’.  It is implemented in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you can see the sequence diagram, which depicts the related class.  Every time a weblog entry is added or changed, the  ‘last-modified’ field of corresponding website table will be updated.  For any http request, PageServlet has to go through a JPA named query to get the ‘last-modified’ value.  That value is not cached in memory, and it is not kind of way that the entities float across context (any how...).  So as far as I can see, it is hard query.  

   But for one page query, there are usually at least ten http query, including query for text/html file, css file, js file, images, and so on.  So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database queries.  Furthermore, for any serious production environment, database and application server are on different tiers and the connection is encrypted with SSL.  So the picture to me it that, for limited concurrent users it is fine, but when request volume goes up, the server may suddenly chocked up.
 
   I would appreciate if some one could respond and explain this, or provide some good advices 
 
 
Thank you very much.
 
 
David



-- 
_____________
John G. Moylan

Re: Roller's implementation on conditional Get

Posted by "John G. Moylan" <jo...@nuatech.net>.

Hi David,

If you are concerned with performance then you should use memcached to cache
JPA lookups. You can also set explicit cache expires on your files. The
last-modified issue you have specified above is the same on most dynamic
systems where last-modified support based on time or etag is used.

J



On 21 May 2010 17:09, (David) Ming Xia <da...@ibol.biz> wrote:

> Hi, Everyone.
>
>    This is about the implementation of conditional Get in Roller 4.0.1.
>    As far as I see, Roller 4.0.1 supports conditional Get. Upon request,
> Roller checks the ‘If-Modified-Since’ field in the http header, and compares
> it with ‘Last-Modified’ attribute on server side.  And then either responds
> with a fresh page with status code 200, or responds with a status code
> 304.
>
>   What I feel concerned is the part retrieving ‘Last-Modified’.  It is
> implemented in
> org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you
> can see the sequence diagram, which depicts the related class.  Every time a
> weblog entry is added or changed, the  ‘last-modified’ field of
> corresponding website table will be updated.  For any http request,
> PageServlet has to go through a JPA named query to get the ‘last-modified’
> value.  That value is not cached in memory, and it is not kind of way that
> the entities float across context (any how...).  So as far as I can see, it
> is hard query.
>     But for one page query, there are usually at least ten http query,
> including query for text/html file, css file, js file, images, and so on.
> So for 10000 simultaneous page requests, there will be at least 100000
> simultaneous database queries.  Furthermore, for any serious production
> environment, database and application server are on different tiers and the
> connection is encrypted with SSL.  So the picture to me it that, for limited
> concurrent users it is fine, but when request volume goes up, the server may
> suddenly chocked up.
>
>    I would appreciate if some one could respond and explain this,
> or provide some good advices
>
>
> Thank you very much.
>
>
> David
>



-- 
_____________
John G. Moylan

Re: Resend -- About weblog view data access

Posted by Dave <sn...@gmail.com>.

On Thu, May 27, 2010 at 5:59 PM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    The current caching system does not fit the task I described.  Current Roller caches are just local hash maps or hash tables, they are not distributed; It has no synchronization of weblog content, especially the value ‘Last-Modified’ for multiple server threads.   While nowadays most production environments are clustering environment, composed of multiple JVMs and application server runtimes.

That's not completely true. Roller has a pluggable page caching and
you can plugin memcached if you want a distributed cache. Code is
available on roller.dev.java.net for the Roller Memcache plugin --
it's not part of Roller because, I think, there is some LGPL
dependency.

For caching of database results, in the past we have used Hibernate's
L2 cache feature, which can also be backed by memcached for
distributed cache. Roller has since switched to OpenJPA, but OpenJPA
also has a pluggable cache.

I would recommend pursuing OpenJPA L2 cache. It would be better if
Roller does not have to implement object caching but can instead rely
on the persistence engine to do that.

- Dave

Resend -- About weblog view data access

Posted by "(David) Ming Xia" <da...@ibol.biz>.

Hi, Dave.

   Sorry for the messed up text.  The following I re-send my last mail.

   Still, this is about the weblog view data access. 

   The web handles specified in roller properties rendering weblogMapper.rollerProtectedUrls are all for user account console and they are not going to appear in user created websites.  They are not of any concern.   What concern us are the requests with URI pattern ‘/roller-ui/rendering/resources’, which are specified in theme.xml as elements of <resource/>.   WeblogRequestMapper validates the handle of an incoming web page text/html content and then validates the handle of each incoming request sent from the corresponding browser client following the URL links specified in that incoming text/html content.  The validating function is WeblogRequestMapper.isWeblog(String potentialHandle).

  Take an example, for a web page has ten links for css, js and images, we are going to have one request and then eleven requests.  For each request Roller will do the following things:

     1.  Retrieve a connection instance from connection pool, or create a new JDBC connection

      2. Retrieve the prepared statement from server statement cache, or create a prepared statement for the named query

       3. Set parameter ‘handle’ and execute the sql queryGet all the data for the
     specified weblog, this includes instances of root category and categories

      4. Recycle the connection or close and discard it for GC 

      5. Create a new weblog object and populate data to this object

   So in this example, for one web page request Roller consumes eleven JDBC connection instances, and creates eleven weblog objects to just check whether the object exists or not.  If some websites on Roller take high volume of http requests, the Roller database could easily be overwhelmed and turn into deadlock.  With all those later incoming requests in line, the memory usage will touch the ceiling.   And now the database is the single point of failure.  Without the database standing there validate web handle for each request and Last-Modified for each text/html request, we are going to see a dead-white page that will go nowhere.  I believe this is highly possible.  Take a look at those technical parameters and usage of database servers, it is obvious that database servers are not designed for a kind of tasks Roller is doing now in validating each http request.  

    I would suggest that cache should be used for weblog page view.  Put it simply, Roller should have cache for weblog and weblog entries.  Roller users manage their account, persist changes to database and update the changes into cache.   Roller users' passwords are not cached, this is for security reason.  Roller viewers retrieve web content, all they see are from cache, they should never touch database.  Something like referrer address or hit counts will be cached and be persisted to database at server stopping, or at administrators’ command.  

   The current caching system does not fit the task I described.  Current Roller caches are just local hash maps or hash tables, they are not distributed; It has no synchronization of weblog content, especially the value ‘Last-Modified’ for multiple server threads.   While nowadays most production environments are clustering environment, composed of multiple JVMs and application server runtimes. 

   I learned that Ehcache support distributed map.  I know that WebSphere cache instance implements IBM distributed map.  The best solution for Roller is an interface for third party distributed cache accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very good. 

Thank you.

David

--- On Wed, 5/26/10, (David) Ming Xia <da...@ibol.biz> wrote:

From: (David) Ming Xia <da...@ibol.biz>
Subject: About weblog view data access
To: user@roller.apache.org, "Mailing List Apache Roller Developer" <de...@roller.apache.org>
Date: Wednesday, May 26, 2010, 8:30 PM

Hi, Dave.

  Still, this is about the weblog view data access.  
   The web handles specified in roller properties rendering
weblogMapper.rollerProtectedUrls are all for user account console and they are
not going to appear in user created websites. 
They are not of any concern.  
What concern us are the requests with URI pattern
‘/roller-ui/rendering/resources’, which are specified in theme.xml as elements
of <resource/>.   WeblogRequestMapper
validates the handle of an incoming web page text/html content and then
validates the handle of each incoming request sent from the corresponding
browser client following the URL links specified in that incoming text/html
content.  The validating function is WeblogRequestMapper.isWeblog(String
potentialHandle).

  Take an example, for a web page has ten
links for css, js and images, we are going to have one request and then eleven
requests.  For each request Roller will
do the following things:

Retrieve a connection instance
     from connection pool, or create a new JDBC connectionRetrieve the prepared statement
     from server statement cache, or create a prepared statement for the named
     querySet parameter ‘handle’ and
     execute the sql queryGet all the data for the
     specified weblog, this includes instances of root category and categoriesRecycle the connection or close
     and discard it for GC Create a new weblog object and
     populate data to this object

   So in this
example, for one web page request Roller consumes eleven JDBC connection
instances, and creates eleven weblog objects to just check whether the object
exists or not.  If some websites on
Roller take high volume of http requests, the Roller database could easily be
overwhelmed and turn into deadlock. 
With all those later incoming requests in line, the memory usage will
touch the ceiling.   And now the
database is the single point of failure. 
Without the database standing there validate web handle for each request
and Last-Modified for each text/html request, we are going to see a dead-white
page that will go nowhere.  I believe
this is highly possible.  Take a look at
those technical parameters and usage of database servers, it is obvious that
database servers are not designed for a kind of tasks Roller is doing now in validating each http request.   

    I would suggest that cache should be used for weblog page
view.  Put is simply, Roller should have
cache for weblog and weblog entries. 
Roller users manage their account, persist changes to database and
update the changes into cache.   Roller
users' passwords are not cached, this is for security reason.  Roller viewers retrieve web content, all they see are from cache,
they should never touch database.  Something
like referrer address or hit counts will be cached and be persisted to database
at server stopping, or at administrators’ command.   

   The current caching system does not fit the task I described.  Current Roller caches are just local hash
maps or hash tables, they are not distributed; It has no synchronization of
weblog content, especially the value ‘Last-Modified’ for multiple server threads.   While nowadays most production environments
are clustering environment, composed of multiple JVMs and application server
runtimes.  

I learned that Ehcache support distributed map.  I know that WebSphere cache instance
implements IBM distributed map.  The
best solution for Roller is an interface for third party distributed cache
accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very
good.  

Thank you.

David

--- On Wed, 5/26/10, Dave <sn...@gmail.com> wrote:

From: Dave <sn...@gmail.com>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Wednesday, May 26, 2010, 7:59 AM

On Wed, May 26, 2010 at 12:11 AM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    I took a look into it and I found another place that has very intensive database queries.
>
>    RequestMappingFilter.doFilter() --> WeblogRequestMapper.handleRequest().
>
>   RequestMapingFilter's URL mapping is /*, so it check every http request.
>
>   WeblogRequestMapper.handleRequest() verifies ALL requests, I mean, including those css, js and image files with named JPA queries.
>
>
>   Actually,  both PageServlet and RequestMappingFilter query weblog with handle.  It looks like database is used as hashtable in these two functions.   While database is usually used for account data transaction, relational data management.
>
>   Now for each web page request there are at least 'eleven' database queries, one for the text/html content in PageServelt and ten requests in mapping filter for everything including the text/html.
>
>   I feel that there could be even more database wires.  Since many people work on Roller and everyone tends to add some more wires.
>
>    It seems that there should be a top-down design solution for this issue.
>
>     Like to hear something from you.

Hi David,

You are correct, WeblogRequestMapper is invoked on every request, but
does nothing when it encounters URLs that begin with these patterns:

   rendering.weblogMapper.rollerProtectedUrls=\
   roller-ui,images,theme,themes,CommentAuthenticatorServlet,\
   index.jsp,favicon.ico,robots.txt,\
   page,flavor,rss,atom,language,search,comments,rsd,resource,xmlrpc,planetrss

It ignores static theme resources (images, CSS, JS, etc.) and
everything else that is not dynamically generated by a weblog page
template. Perhaps the problem is not quite as bad as you think.

There have not been that many people working on Roller and the ones
that have worked on the code have been pretty disciplined about when
database calls are made. But of course, even disciplined developers
make mistakes. I'm sure there is much room for improvement and I
encourage you to continue your research into performance bottlenecks.

If you have a proposal for a top-down solution, or some patches to
improve things -- I'd be happy to review them or even commit them for
you if they look good.

- Dave

Resend -- About weblog view data access

Posted by "(David) Ming Xia" <da...@ibol.biz>.

Hi, Dave.

   Sorry for the messed up text.  The following I re-send my last mail.

   Still, this is about the weblog view data access. 

   The web handles specified in roller properties rendering weblogMapper.rollerProtectedUrls are all for user account console and they are not going to appear in user created websites.  They are not of any concern.   What concern us are the requests with URI pattern ‘/roller-ui/rendering/resources’, which are specified in theme.xml as elements of <resource/>.   WeblogRequestMapper validates the handle of an incoming web page text/html content and then validates the handle of each incoming request sent from the corresponding browser client following the URL links specified in that incoming text/html content.  The validating function is WeblogRequestMapper.isWeblog(String potentialHandle).

  Take an example, for a web page has ten links for css, js and images, we are going to have one request and then eleven requests.  For each request Roller will do the following things:

     1.  Retrieve a connection instance from connection pool, or create a new JDBC connection

      2. Retrieve the prepared statement from server statement cache, or create a prepared statement for the named query

       3. Set parameter ‘handle’ and execute the sql queryGet all the data for the
     specified weblog, this includes instances of root category and categories

      4. Recycle the connection or close and discard it for GC 

      5. Create a new weblog object and populate data to this object

   So in this example, for one web page request Roller consumes eleven JDBC connection instances, and creates eleven weblog objects to just check whether the object exists or not.  If some websites on Roller take high volume of http requests, the Roller database could easily be overwhelmed and turn into deadlock.  With all those later incoming requests in line, the memory usage will touch the ceiling.   And now the database is the single point of failure.  Without the database standing there validate web handle for each request and Last-Modified for each text/html request, we are going to see a dead-white page that will go nowhere.  I believe this is highly possible.  Take a look at those technical parameters and usage of database servers, it is obvious that database servers are not designed for a kind of tasks Roller is doing now in validating each http request.  

    I would suggest that cache should be used for weblog page view.  Put it simply, Roller should have cache for weblog and weblog entries.  Roller users manage their account, persist changes to database and update the changes into cache.   Roller users' passwords are not cached, this is for security reason.  Roller viewers retrieve web content, all they see are from cache, they should never touch database.  Something like referrer address or hit counts will be cached and be persisted to database at server stopping, or at administrators’ command.  

   The current caching system does not fit the task I described.  Current Roller caches are just local hash maps or hash tables, they are not distributed; It has no synchronization of weblog content, especially the value ‘Last-Modified’ for multiple server threads.   While nowadays most production environments are clustering environment, composed of multiple JVMs and application server runtimes. 

   I learned that Ehcache support distributed map.  I know that WebSphere cache instance implements IBM distributed map.  The best solution for Roller is an interface for third party distributed cache accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very good. 

Thank you.

David

--- On Wed, 5/26/10, (David) Ming Xia <da...@ibol.biz> wrote:

From: (David) Ming Xia <da...@ibol.biz>
Subject: About weblog view data access
To: user@roller.apache.org, "Mailing List Apache Roller Developer" <de...@roller.apache.org>
Date: Wednesday, May 26, 2010, 8:30 PM

Hi, Dave.

  Still, this is about the weblog view data access.  
   The web handles specified in roller properties rendering
weblogMapper.rollerProtectedUrls are all for user account console and they are
not going to appear in user created websites. 
They are not of any concern.  
What concern us are the requests with URI pattern
‘/roller-ui/rendering/resources’, which are specified in theme.xml as elements
of <resource/>.   WeblogRequestMapper
validates the handle of an incoming web page text/html content and then
validates the handle of each incoming request sent from the corresponding
browser client following the URL links specified in that incoming text/html
content.  The validating function is WeblogRequestMapper.isWeblog(String
potentialHandle).

  Take an example, for a web page has ten
links for css, js and images, we are going to have one request and then eleven
requests.  For each request Roller will
do the following things:

Retrieve a connection instance
     from connection pool, or create a new JDBC connectionRetrieve the prepared statement
     from server statement cache, or create a prepared statement for the named
     querySet parameter ‘handle’ and
     execute the sql queryGet all the data for the
     specified weblog, this includes instances of root category and categoriesRecycle the connection or close
     and discard it for GC Create a new weblog object and
     populate data to this object

   So in this
example, for one web page request Roller consumes eleven JDBC connection
instances, and creates eleven weblog objects to just check whether the object
exists or not.  If some websites on
Roller take high volume of http requests, the Roller database could easily be
overwhelmed and turn into deadlock. 
With all those later incoming requests in line, the memory usage will
touch the ceiling.   And now the
database is the single point of failure. 
Without the database standing there validate web handle for each request
and Last-Modified for each text/html request, we are going to see a dead-white
page that will go nowhere.  I believe
this is highly possible.  Take a look at
those technical parameters and usage of database servers, it is obvious that
database servers are not designed for a kind of tasks Roller is doing now in validating each http request.   

    I would suggest that cache should be used for weblog page
view.  Put is simply, Roller should have
cache for weblog and weblog entries. 
Roller users manage their account, persist changes to database and
update the changes into cache.   Roller
users' passwords are not cached, this is for security reason.  Roller viewers retrieve web content, all they see are from cache,
they should never touch database.  Something
like referrer address or hit counts will be cached and be persisted to database
at server stopping, or at administrators’ command.   

   The current caching system does not fit the task I described.  Current Roller caches are just local hash
maps or hash tables, they are not distributed; It has no synchronization of
weblog content, especially the value ‘Last-Modified’ for multiple server threads.   While nowadays most production environments
are clustering environment, composed of multiple JVMs and application server
runtimes.  

I learned that Ehcache support distributed map.  I know that WebSphere cache instance
implements IBM distributed map.  The
best solution for Roller is an interface for third party distributed cache
accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very
good.  

Thank you.

David

--- On Wed, 5/26/10, Dave <sn...@gmail.com> wrote:

From: Dave <sn...@gmail.com>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Wednesday, May 26, 2010, 7:59 AM

On Wed, May 26, 2010 at 12:11 AM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    I took a look into it and I found another place that has very intensive database queries.
>
>    RequestMappingFilter.doFilter() --> WeblogRequestMapper.handleRequest().
>
>   RequestMapingFilter's URL mapping is /*, so it check every http request.
>
>   WeblogRequestMapper.handleRequest() verifies ALL requests, I mean, including those css, js and image files with named JPA queries.
>
>
>   Actually,  both PageServlet and RequestMappingFilter query weblog with handle.  It looks like database is used as hashtable in these two functions.   While database is usually used for account data transaction, relational data management.
>
>   Now for each web page request there are at least 'eleven' database queries, one for the text/html content in PageServelt and ten requests in mapping filter for everything including the text/html.
>
>   I feel that there could be even more database wires.  Since many people work on Roller and everyone tends to add some more wires.
>
>    It seems that there should be a top-down design solution for this issue.
>
>     Like to hear something from you.

Hi David,

You are correct, WeblogRequestMapper is invoked on every request, but
does nothing when it encounters URLs that begin with these patterns:

   rendering.weblogMapper.rollerProtectedUrls=\
   roller-ui,images,theme,themes,CommentAuthenticatorServlet,\
   index.jsp,favicon.ico,robots.txt,\
   page,flavor,rss,atom,language,search,comments,rsd,resource,xmlrpc,planetrss

It ignores static theme resources (images, CSS, JS, etc.) and
everything else that is not dynamically generated by a weblog page
template. Perhaps the problem is not quite as bad as you think.

There have not been that many people working on Roller and the ones
that have worked on the code have been pretty disciplined about when
database calls are made. But of course, even disciplined developers
make mistakes. I'm sure there is much room for improvement and I
encourage you to continue your research into performance bottlenecks.

If you have a proposal for a top-down solution, or some patches to
improve things -- I'd be happy to review them or even commit them for
you if they look good.

- Dave

About weblog view data access

Posted by "(David) Ming Xia" <da...@ibol.biz>.




Hi, Dave.

    



  Still, this is about the weblog view data access.  
   The web handles specified in roller properties rendering
weblogMapper.rollerProtectedUrls are all for user account console and they are
not going to appear in user created websites. 
They are not of any concern.  
What concern us are the requests with URI pattern
‘/roller-ui/rendering/resources’, which are specified in theme.xml as elements
of <resource/>.   WeblogRequestMapper
validates the handle of an incoming web page text/html content and then
validates the handle of each incoming request sent from the corresponding
browser client following the URL links specified in that incoming text/html
content.  The validating function is WeblogRequestMapper.isWeblog(String
potentialHandle).

 

  Take an example, for a web page has ten
links for css, js and images, we are going to have one request and then eleven
requests.  For each request Roller will
do the following things:

 

Retrieve a connection instance
     from connection pool, or create a new JDBC connectionRetrieve the prepared statement
     from server statement cache, or create a prepared statement for the named
     querySet parameter ‘handle’ and
     execute the sql queryGet all the data for the
     specified weblog, this includes instances of root category and categoriesRecycle the connection or close
     and discard it for GC Create a new weblog object and
     populate data to this object

 

   So in this
example, for one web page request Roller consumes eleven JDBC connection
instances, and creates eleven weblog objects to just check whether the object
exists or not.  If some websites on
Roller take high volume of http requests, the Roller database could easily be
overwhelmed and turn into deadlock. 
With all those later incoming requests in line, the memory usage will
touch the ceiling.   And now the
database is the single point of failure. 
Without the database standing there validate web handle for each request
and Last-Modified for each text/html request, we are going to see a dead-white
page that will go nowhere.  I believe
this is highly possible.  Take a look at
those technical parameters and usage of database servers, it is obvious that
database servers are not designed for a kind of tasks Roller is doing now in validating each http request.   

 

 

    I would suggest that cache should be used for weblog page
view.  Put is simply, Roller should have
cache for weblog and weblog entries. 
Roller users manage their account, persist changes to database and
update the changes into cache.   Roller
users' passwords are not cached, this is for security reason.  Roller viewers retrieve web content, all they see are from cache,
they should never touch database.  Something
like referrer address or hit counts will be cached and be persisted to database
at server stopping, or at administrators’ command.   

 

 

   The current caching system does not fit the task I described.  Current Roller caches are just local hash
maps or hash tables, they are not distributed; It has no synchronization of
weblog content, especially the value ‘Last-Modified’ for multiple server threads.   While nowadays most production environments
are clustering environment, composed of multiple JVMs and application server
runtimes.  

 

I learned that Ehcache support distributed map.  I know that WebSphere cache instance
implements IBM distributed map.  The
best solution for Roller is an interface for third party distributed cache
accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very
good.  


Thank you.




David


--- On Wed, 5/26/10, Dave <sn...@gmail.com> wrote:

From: Dave <sn...@gmail.com>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Wednesday, May 26, 2010, 7:59 AM

On Wed, May 26, 2010 at 12:11 AM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    I took a look into it and I found another place that has very intensive database queries.
>
>    RequestMappingFilter.doFilter() --> WeblogRequestMapper.handleRequest().
>
>   RequestMapingFilter's URL mapping is /*, so it check every http request.
>
>   WeblogRequestMapper.handleRequest() verifies ALL requests, I mean, including those css, js and image files with named JPA queries.
>
>
>   Actually,  both PageServlet and RequestMappingFilter query weblog with handle.  It looks like database is used as hashtable in these two functions.   While database is usually used for account data transaction, relational data management.
>
>   Now for each web page request there are at least 'eleven' database queries, one for the text/html content in PageServelt and ten requests in mapping filter for everything including the text/html.
>
>   I feel that there could be even more database wires.  Since many people work on Roller and everyone tends to add some more wires.
>
>    It seems that there should be a top-down design solution for this issue.
>
>     Like to hear something from you.

Hi David,

You are correct, WeblogRequestMapper is invoked on every request, but
does nothing when it encounters URLs that begin with these patterns:

   rendering.weblogMapper.rollerProtectedUrls=\
   roller-ui,images,theme,themes,CommentAuthenticatorServlet,\
   index.jsp,favicon.ico,robots.txt,\
   page,flavor,rss,atom,language,search,comments,rsd,resource,xmlrpc,planetrss

It ignores static theme resources (images, CSS, JS, etc.) and
everything else that is not dynamically generated by a weblog page
template. Perhaps the problem is not quite as bad as you think.

There have not been that many people working on Roller and the ones
that have worked on the code have been pretty disciplined about when
database calls are made. But of course, even disciplined developers
make mistakes. I'm sure there is much room for improvement and I
encourage you to continue your research into performance bottlenecks.

If you have a proposal for a top-down solution, or some patches to
improve things -- I'd be happy to review them or even commit them for
you if they look good.

- Dave

About weblog view data access

Posted by "(David) Ming Xia" <da...@ibol.biz>.




Hi, Dave.

    



  Still, this is about the weblog view data access.  
   The web handles specified in roller properties rendering
weblogMapper.rollerProtectedUrls are all for user account console and they are
not going to appear in user created websites. 
They are not of any concern.  
What concern us are the requests with URI pattern
‘/roller-ui/rendering/resources’, which are specified in theme.xml as elements
of <resource/>.   WeblogRequestMapper
validates the handle of an incoming web page text/html content and then
validates the handle of each incoming request sent from the corresponding
browser client following the URL links specified in that incoming text/html
content.  The validating function is WeblogRequestMapper.isWeblog(String
potentialHandle).

 

  Take an example, for a web page has ten
links for css, js and images, we are going to have one request and then eleven
requests.  For each request Roller will
do the following things:

 

Retrieve a connection instance
     from connection pool, or create a new JDBC connectionRetrieve the prepared statement
     from server statement cache, or create a prepared statement for the named
     querySet parameter ‘handle’ and
     execute the sql queryGet all the data for the
     specified weblog, this includes instances of root category and categoriesRecycle the connection or close
     and discard it for GC Create a new weblog object and
     populate data to this object

 

   So in this
example, for one web page request Roller consumes eleven JDBC connection
instances, and creates eleven weblog objects to just check whether the object
exists or not.  If some websites on
Roller take high volume of http requests, the Roller database could easily be
overwhelmed and turn into deadlock. 
With all those later incoming requests in line, the memory usage will
touch the ceiling.   And now the
database is the single point of failure. 
Without the database standing there validate web handle for each request
and Last-Modified for each text/html request, we are going to see a dead-white
page that will go nowhere.  I believe
this is highly possible.  Take a look at
those technical parameters and usage of database servers, it is obvious that
database servers are not designed for a kind of tasks Roller is doing now in validating each http request.   

 

 

    I would suggest that cache should be used for weblog page
view.  Put is simply, Roller should have
cache for weblog and weblog entries. 
Roller users manage their account, persist changes to database and
update the changes into cache.   Roller
users' passwords are not cached, this is for security reason.  Roller viewers retrieve web content, all they see are from cache,
they should never touch database.  Something
like referrer address or hit counts will be cached and be persisted to database
at server stopping, or at administrators’ command.   

 

 

   The current caching system does not fit the task I described.  Current Roller caches are just local hash
maps or hash tables, they are not distributed; It has no synchronization of
weblog content, especially the value ‘Last-Modified’ for multiple server threads.   While nowadays most production environments
are clustering environment, composed of multiple JVMs and application server
runtimes.  

 

I learned that Ehcache support distributed map.  I know that WebSphere cache instance
implements IBM distributed map.  The
best solution for Roller is an interface for third party distributed cache
accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very
good.  


Thank you.




David


--- On Wed, 5/26/10, Dave <sn...@gmail.com> wrote:

From: Dave <sn...@gmail.com>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Wednesday, May 26, 2010, 7:59 AM

On Wed, May 26, 2010 at 12:11 AM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    I took a look into it and I found another place that has very intensive database queries.
>
>    RequestMappingFilter.doFilter() --> WeblogRequestMapper.handleRequest().
>
>   RequestMapingFilter's URL mapping is /*, so it check every http request.
>
>   WeblogRequestMapper.handleRequest() verifies ALL requests, I mean, including those css, js and image files with named JPA queries.
>
>
>   Actually,  both PageServlet and RequestMappingFilter query weblog with handle.  It looks like database is used as hashtable in these two functions.   While database is usually used for account data transaction, relational data management.
>
>   Now for each web page request there are at least 'eleven' database queries, one for the text/html content in PageServelt and ten requests in mapping filter for everything including the text/html.
>
>   I feel that there could be even more database wires.  Since many people work on Roller and everyone tends to add some more wires.
>
>    It seems that there should be a top-down design solution for this issue.
>
>     Like to hear something from you.

Hi David,

You are correct, WeblogRequestMapper is invoked on every request, but
does nothing when it encounters URLs that begin with these patterns:

   rendering.weblogMapper.rollerProtectedUrls=\
   roller-ui,images,theme,themes,CommentAuthenticatorServlet,\
   index.jsp,favicon.ico,robots.txt,\
   page,flavor,rss,atom,language,search,comments,rsd,resource,xmlrpc,planetrss

It ignores static theme resources (images, CSS, JS, etc.) and
everything else that is not dynamically generated by a weblog page
template. Perhaps the problem is not quite as bad as you think.

There have not been that many people working on Roller and the ones
that have worked on the code have been pretty disciplined about when
database calls are made. But of course, even disciplined developers
make mistakes. I'm sure there is much room for improvement and I
encourage you to continue your research into performance bottlenecks.

If you have a proposal for a top-down solution, or some patches to
improve things -- I'd be happy to review them or even commit them for
you if they look good.

- Dave

Re: Roller's implementation on conditional Get

Posted by Dave <sn...@gmail.com>.

On Wed, May 26, 2010 at 12:11 AM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    I took a look into it and I found another place that has very intensive database queries.
>
>    RequestMappingFilter.doFilter() --> WeblogRequestMapper.handleRequest().
>
>   RequestMapingFilter's URL mapping is /*, so it check every http request.
>
>   WeblogRequestMapper.handleRequest() verifies ALL requests, I mean, including those css, js and image files with named JPA queries.
>
>
>   Actually,  both PageServlet and RequestMappingFilter query weblog with handle.  It looks like database is used as hashtable in these two functions.   While database is usually used for account data transaction, relational data management.
>
>   Now for each web page request there are at least 'eleven' database queries, one for the text/html content in PageServelt and ten requests in mapping filter for everything including the text/html.
>
>   I feel that there could be even more database wires.  Since many people work on Roller and everyone tends to add some more wires.
>
>    It seems that there should be a top-down design solution for this issue.
>
>     Like to hear something from you.

Hi David,

You are correct, WeblogRequestMapper is invoked on every request, but
does nothing when it encounters URLs that begin with these patterns:

   rendering.weblogMapper.rollerProtectedUrls=\
   roller-ui,images,theme,themes,CommentAuthenticatorServlet,\
   index.jsp,favicon.ico,robots.txt,\
   page,flavor,rss,atom,language,search,comments,rsd,resource,xmlrpc,planetrss

It ignores static theme resources (images, CSS, JS, etc.) and
everything else that is not dynamically generated by a weblog page
template. Perhaps the problem is not quite as bad as you think.

There have not been that many people working on Roller and the ones
that have worked on the code have been pretty disciplined about when
database calls are made. But of course, even disciplined developers
make mistakes. I'm sure there is much room for improvement and I
encourage you to continue your research into performance bottlenecks.

If you have a proposal for a top-down solution, or some patches to
improve things -- I'd be happy to review them or even commit them for
you if they look good.

- Dave

Re: Roller's implementation on conditional Get

Posted by "(David) Ming Xia" <da...@ibol.biz>.

Hi, Dave.

   I took a look into it and I found another place that has very intensive database queries.

   RequestMappingFilter.doFilter() --> WeblogRequestMapper.handleRequest().

  RequestMapingFilter's URL mapping is /*, so it check every http request.

  WeblogRequestMapper.handleRequest() verifies ALL requests, I mean, including those css, js and image files with named JPA queries.  

  Actually,  both PageServlet and RequestMappingFilter query weblog with handle.  It looks like database is used as hashtable in these two functions.   While database is usually used for account data transaction, relational data management.    

  Now for each web page request there are at least 'eleven' database queries, one for the text/html content in PageServelt and ten requests in mapping filter for everything including the text/html.

  I feel that there could be even more database wires.  Since many people work on Roller and everyone tends to add some more wires. 

   It seems that there should be a top-down design solution for this issue.   

    Like to hear something from you.

David

--- On Tue, 5/25/10, Dave <sn...@gmail.com> wrote:

From: Dave <sn...@gmail.com>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Tuesday, May 25, 2010, 9:14 PM

On Tue, May 25, 2010 at 8:59 PM, (David) Ming Xia
<da...@ibol.biz> wrote:
> Thank you very much Dave for your response.
>
>    You are right.  Only the text/html content is mapped to URI /roller-ui/rendering/page and caught by PageServlet and invoked JPA named query for weblog.   All the resource files are mapped to URI '/roller-ui/rendering/resources'.   Roller is very complicated, indeed.
>
>   Now I would like to ask one more question.  Now we know, for each query to a weblog page, there going to be one named JPA query, or a database select query.  What if some one launch an attack on weblog pages on a Roller site?  While registration page and login page can be protected with captcha, weblog pages have to withstand whatever it is.  Now the bottleneck of Roller will be the database server.    Roller should be easily scaled up the by different means such as clustering.
>
>   What do you think should we do to protect the Roller against an attack described above? Do you think it should be better if we use cache for last-modified?

Yes, caching last-modified for each weblog could help here -- you
could do this via relatively small changes to the PageServlet and I'd
recommend FeedServlet too.

- Dave

Re: Roller's implementation on conditional Get

Posted by "(David) Ming Xia" <da...@ibol.biz>.

Hi, Dave.

   I took a look into it and I found another place that has very intensive database queries.

   RequestMappingFilter.doFilter() --> WeblogRequestMapper.handleRequest().

  RequestMapingFilter's URL mapping is /*, so it check every http request.

  WeblogRequestMapper.handleRequest() verifies ALL requests, I mean, including those css, js and image files with named JPA queries.  

  Actually,  both PageServlet and RequestMappingFilter query weblog with handle.  It looks like database is used as hashtable in these two functions.   While database is usually used for account data transaction, relational data management.    

  Now for each web page request there are at least 'eleven' database queries, one for the text/html content in PageServelt and ten requests in mapping filter for everything including the text/html.

  I feel that there could be even more database wires.  Since many people work on Roller and everyone tends to add some more wires. 

   It seems that there should be a top-down design solution for this issue.   

    Like to hear something from you.

David

--- On Tue, 5/25/10, Dave <sn...@gmail.com> wrote:

From: Dave <sn...@gmail.com>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Tuesday, May 25, 2010, 9:14 PM

On Tue, May 25, 2010 at 8:59 PM, (David) Ming Xia
<da...@ibol.biz> wrote:
> Thank you very much Dave for your response.
>
>    You are right.  Only the text/html content is mapped to URI /roller-ui/rendering/page and caught by PageServlet and invoked JPA named query for weblog.   All the resource files are mapped to URI '/roller-ui/rendering/resources'.   Roller is very complicated, indeed.
>
>   Now I would like to ask one more question.  Now we know, for each query to a weblog page, there going to be one named JPA query, or a database select query.  What if some one launch an attack on weblog pages on a Roller site?  While registration page and login page can be protected with captcha, weblog pages have to withstand whatever it is.  Now the bottleneck of Roller will be the database server.    Roller should be easily scaled up the by different means such as clustering.
>
>   What do you think should we do to protect the Roller against an attack described above? Do you think it should be better if we use cache for last-modified?

Yes, caching last-modified for each weblog could help here -- you
could do this via relatively small changes to the PageServlet and I'd
recommend FeedServlet too.

- Dave

Re: Roller's implementation on conditional Get

Posted by Dave <sn...@gmail.com>.

On Tue, May 25, 2010 at 8:59 PM, (David) Ming Xia
<da...@ibol.biz> wrote:
> Thank you very much Dave for your response.
>
>    You are right.  Only the text/html content is mapped to URI /roller-ui/rendering/page and caught by PageServlet and invoked JPA named query for weblog.   All the resource files are mapped to URI '/roller-ui/rendering/resources'.   Roller is very complicated, indeed.
>
>   Now I would like to ask one more question.  Now we know, for each query to a weblog page, there going to be one named JPA query, or a database select query.  What if some one launch an attack on weblog pages on a Roller site?  While registration page and login page can be protected with captcha, weblog pages have to withstand whatever it is.  Now the bottleneck of Roller will be the database server.    Roller should be easily scaled up the by different means such as clustering.
>
>   What do you think should we do to protect the Roller against an attack described above? Do you think it should be better if we use cache for last-modified?

Yes, caching last-modified for each weblog could help here -- you
could do this via relatively small changes to the PageServlet and I'd
recommend FeedServlet too.

- Dave

Re: Roller's implementation on conditional Get

Posted by "(David) Ming Xia" <da...@ibol.biz>.

Thank you very much Dave for your response.

   You are right.  Only the text/html content is mapped to URI /roller-ui/rendering/page and caught by PageServlet and invoked JPA named query for weblog.   All the resource files are mapped to URI '/roller-ui/rendering/resources'.   Roller is very complicated, indeed.

  Now I would like to ask one more question.  Now we know, for each query to a weblog page, there going to be one named JPA query, or a database select query.  What if some one launch an attack on weblog pages on a Roller site?  While registration page and login page can be protected with captcha, weblog pages have to withstand whatever it is.  Now the bottleneck of Roller will be the database server.    Roller should be easily scaled up the by different means such as clustering.  

  What do you think should we do to protect the Roller against an attack described above? Do you think it should be better if we use cache for last-modified?    

Thank you very much.  

David

--- On Tue, 5/25/10, Dave <sn...@gmail.com> wrote:

From: Dave <sn...@gmail.com>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Tuesday, May 25, 2010, 8:47 AM

On Fri, May 21, 2010 at 12:09 PM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    This is about the implementation of conditional Get in Roller 4.0.1.
>    As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’ attribute on server side.  And then either responds with a fresh page with status code 200, or responds with a status code 304.

That is true for blog pages and feeds only.

>   What I feel concerned is the part retrieving ‘Last-Modified’.  It is implemented in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you can see the sequence diagram, which depicts the related class.

I don't see any sequence diagram. This mailing list does not accept
attachments. Perhaps you could post the picture somewhere and send a
URL?

Every time a weblog entry is added or changed, the  ‘last-modified’
field of corresponding website table will be updated.  For any http
request, PageServlet has to go through a JPA named query to get the
‘last-modified’ value.  That value is not cached in memory, and it is
not kind of way that the entities float across context (any how...).
So as far as I can see, it is hard query.
>    But for one page query, there are usually at least ten http query, including query for text/html file, css file, js file, images, and so on.

Right, but CSS files and JS files that are file systems resources
(theme files, etc.) are served directly by the Servlet Engine, which
has its own conditional GET implementation, and NOT through the Roller
PageServlet.

> So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database queries.  Furthermore, for any serious production environment, database and application server are on different tiers and the connection is encrypted with SSL.  So the picture to me it that, for limited concurrent users it is fine, but when request volume goes up, the server may suddenly chocked up.

When something in a weblog changes, we invalidate the weblog's cache
and this works well because lot more reads than writes. There might be
a couple of bloggers and thousands of readers and subscribers. So, the
cache is rarely invalidated.

And like I said, the page servlet caches only pages so what you said
about 100,000 database queries is not true unless you are storing CSS,
JS and other static resources as Roller page templates -- which you
should not be doing.

- Dave

Re: Roller's implementation on conditional Get

Posted by "(David) Ming Xia" <da...@ibol.biz>.

Thank you very much Dave for your response.

   You are right.  Only the text/html content is mapped to URI /roller-ui/rendering/page and caught by PageServlet and invoked JPA named query for weblog.   All the resource files are mapped to URI '/roller-ui/rendering/resources'.   Roller is very complicated, indeed.

  Now I would like to ask one more question.  Now we know, for each query to a weblog page, there going to be one named JPA query, or a database select query.  What if some one launch an attack on weblog pages on a Roller site?  While registration page and login page can be protected with captcha, weblog pages have to withstand whatever it is.  Now the bottleneck of Roller will be the database server.    Roller should be easily scaled up the by different means such as clustering.  

  What do you think should we do to protect the Roller against an attack described above? Do you think it should be better if we use cache for last-modified?    

Thank you very much.  

David

--- On Tue, 5/25/10, Dave <sn...@gmail.com> wrote:

From: Dave <sn...@gmail.com>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Tuesday, May 25, 2010, 8:47 AM

On Fri, May 21, 2010 at 12:09 PM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    This is about the implementation of conditional Get in Roller 4.0.1.
>    As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’ attribute on server side.  And then either responds with a fresh page with status code 200, or responds with a status code 304.

That is true for blog pages and feeds only.

>   What I feel concerned is the part retrieving ‘Last-Modified’.  It is implemented in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you can see the sequence diagram, which depicts the related class.

I don't see any sequence diagram. This mailing list does not accept
attachments. Perhaps you could post the picture somewhere and send a
URL?

Every time a weblog entry is added or changed, the  ‘last-modified’
field of corresponding website table will be updated.  For any http
request, PageServlet has to go through a JPA named query to get the
‘last-modified’ value.  That value is not cached in memory, and it is
not kind of way that the entities float across context (any how...).
So as far as I can see, it is hard query.
>    But for one page query, there are usually at least ten http query, including query for text/html file, css file, js file, images, and so on.

Right, but CSS files and JS files that are file systems resources
(theme files, etc.) are served directly by the Servlet Engine, which
has its own conditional GET implementation, and NOT through the Roller
PageServlet.

> So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database queries.  Furthermore, for any serious production environment, database and application server are on different tiers and the connection is encrypted with SSL.  So the picture to me it that, for limited concurrent users it is fine, but when request volume goes up, the server may suddenly chocked up.

When something in a weblog changes, we invalidate the weblog's cache
and this works well because lot more reads than writes. There might be
a couple of bloggers and thousands of readers and subscribers. So, the
cache is rarely invalidated.

And like I said, the page servlet caches only pages so what you said
about 100,000 database queries is not true unless you are storing CSS,
JS and other static resources as Roller page templates -- which you
should not be doing.

- Dave

Re: Roller's implementation on conditional Get

Posted by Dave <sn...@gmail.com>.

On Fri, May 21, 2010 at 12:09 PM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    This is about the implementation of conditional Get in Roller 4.0.1.
>    As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’ attribute on server side.  And then either responds with a fresh page with status code 200, or responds with a status code 304.

That is true for blog pages and feeds only.

>   What I feel concerned is the part retrieving ‘Last-Modified’.  It is implemented in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet.  Attached you can see the sequence diagram, which depicts the related class.

I don't see any sequence diagram. This mailing list does not accept
attachments. Perhaps you could post the picture somewhere and send a
URL?

Every time a weblog entry is added or changed, the  ‘last-modified’
field of corresponding website table will be updated.  For any http
request, PageServlet has to go through a JPA named query to get the
‘last-modified’ value.  That value is not cached in memory, and it is
not kind of way that the entities float across context (any how...).
So as far as I can see, it is hard query.
>    But for one page query, there are usually at least ten http query, including query for text/html file, css file, js file, images, and so on.

Right, but CSS files and JS files that are file systems resources
(theme files, etc.) are served directly by the Servlet Engine, which
has its own conditional GET implementation, and NOT through the Roller
PageServlet.

> So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database queries.  Furthermore, for any serious production environment, database and application server are on different tiers and the connection is encrypted with SSL.  So the picture to me it that, for limited concurrent users it is fine, but when request volume goes up, the server may suddenly chocked up.

When something in a weblog changes, we invalidate the weblog's cache
and this works well because lot more reads than writes. There might be
a couple of bloggers and thousands of readers and subscribers. So, the
cache is rarely invalidated.

And like I said, the page servlet caches only pages so what you said
about 100,000 database queries is not true unless you are storing CSS,
JS and other static resources as Roller page templates -- which you
should not be doing.

- Dave