You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@roller.apache.org by "(David) Ming Xia" <da...@ibol.biz> on 2010/05/27 23:59:14 UTC

Resend -- About weblog view data access

Hi, Dave.

   Sorry for the messed up text.  The following I re-send my last mail.

   Still, this is about the weblog view data access. 

   The web handles specified in roller properties rendering weblogMapper.rollerProtectedUrls are all for user account console and they are not going to appear in user created websites.  They are not of any concern.   What concern us are the requests with URI pattern ‘/roller-ui/rendering/resources’, which are specified in theme.xml as elements of <resource/>.   WeblogRequestMapper validates the handle of an incoming web page text/html content and then validates the handle of each incoming request sent from the corresponding browser client following the URL links specified in that incoming text/html content.  The validating function is WeblogRequestMapper.isWeblog(String potentialHandle).

  Take an example, for a web page has ten links for css, js and images, we are going to have one request and then eleven requests.  For each request Roller will do the following things:

     1.  Retrieve a connection instance from connection pool, or create a new JDBC connection

      2. Retrieve the prepared statement from server statement cache, or create a prepared statement for the named query

       3. Set parameter ‘handle’ and execute the sql queryGet all the data for the
     specified weblog, this includes instances of root category and categories

      4. Recycle the connection or close and discard it for GC 

      5. Create a new weblog object and populate data to this object

   So in this example, for one web page request Roller consumes eleven JDBC connection instances, and creates eleven weblog objects to just check whether the object exists or not.  If some websites on Roller take high volume of http requests, the Roller database could easily be overwhelmed and turn into deadlock.  With all those later incoming requests in line, the memory usage will touch the ceiling.   And now the database is the single point of failure.  Without the database standing there validate web handle for each request and Last-Modified for each text/html request, we are going to see a dead-white page that will go nowhere.  I believe this is highly possible.  Take a look at those technical parameters and usage of database servers, it is obvious that database servers are not designed for a kind of tasks Roller is doing now in validating each http request.  

    I would suggest that cache should be used for weblog page view.  Put it simply, Roller should have cache for weblog and weblog entries.  Roller users manage their account, persist changes to database and update the changes into cache.   Roller users' passwords are not cached, this is for security reason.  Roller viewers retrieve web content, all they see are from cache, they should never touch database.  Something like referrer address or hit counts will be cached and be persisted to database at server stopping, or at administrators’ command.  

   The current caching system does not fit the task I described.  Current Roller caches are just local hash maps or hash tables, they are not distributed; It has no synchronization of weblog content, especially the value ‘Last-Modified’ for multiple server threads.   While nowadays most production environments are clustering environment, composed of multiple JVMs and application server runtimes. 

   I learned that Ehcache support distributed map.  I know that WebSphere cache instance implements IBM distributed map.  The best solution for Roller is an interface for third party distributed cache accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very good. 

Thank you.

David

--- On Wed, 5/26/10, (David) Ming Xia <da...@ibol.biz> wrote:

From: (David) Ming Xia <da...@ibol.biz>
Subject: About weblog view data access
To: user@roller.apache.org, "Mailing List Apache Roller Developer" <de...@roller.apache.org>
Date: Wednesday, May 26, 2010, 8:30 PM

Hi, Dave.

  Still, this is about the weblog view data access.  
   The web handles specified in roller properties rendering
weblogMapper.rollerProtectedUrls are all for user account console and they are
not going to appear in user created websites. 
They are not of any concern.  
What concern us are the requests with URI pattern
‘/roller-ui/rendering/resources’, which are specified in theme.xml as elements
of <resource/>.   WeblogRequestMapper
validates the handle of an incoming web page text/html content and then
validates the handle of each incoming request sent from the corresponding
browser client following the URL links specified in that incoming text/html
content.  The validating function is WeblogRequestMapper.isWeblog(String
potentialHandle).

  Take an example, for a web page has ten
links for css, js and images, we are going to have one request and then eleven
requests.  For each request Roller will
do the following things:

Retrieve a connection instance
     from connection pool, or create a new JDBC connectionRetrieve the prepared statement
     from server statement cache, or create a prepared statement for the named
     querySet parameter ‘handle’ and
     execute the sql queryGet all the data for the
     specified weblog, this includes instances of root category and categoriesRecycle the connection or close
     and discard it for GC Create a new weblog object and
     populate data to this object

   So in this
example, for one web page request Roller consumes eleven JDBC connection
instances, and creates eleven weblog objects to just check whether the object
exists or not.  If some websites on
Roller take high volume of http requests, the Roller database could easily be
overwhelmed and turn into deadlock. 
With all those later incoming requests in line, the memory usage will
touch the ceiling.   And now the
database is the single point of failure. 
Without the database standing there validate web handle for each request
and Last-Modified for each text/html request, we are going to see a dead-white
page that will go nowhere.  I believe
this is highly possible.  Take a look at
those technical parameters and usage of database servers, it is obvious that
database servers are not designed for a kind of tasks Roller is doing now in validating each http request.   

    I would suggest that cache should be used for weblog page
view.  Put is simply, Roller should have
cache for weblog and weblog entries. 
Roller users manage their account, persist changes to database and
update the changes into cache.   Roller
users' passwords are not cached, this is for security reason.  Roller viewers retrieve web content, all they see are from cache,
they should never touch database.  Something
like referrer address or hit counts will be cached and be persisted to database
at server stopping, or at administrators’ command.   

   The current caching system does not fit the task I described.  Current Roller caches are just local hash
maps or hash tables, they are not distributed; It has no synchronization of
weblog content, especially the value ‘Last-Modified’ for multiple server threads.   While nowadays most production environments
are clustering environment, composed of multiple JVMs and application server
runtimes.  

I learned that Ehcache support distributed map.  I know that WebSphere cache instance
implements IBM distributed map.  The
best solution for Roller is an interface for third party distributed cache
accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very
good.  

Thank you.

David

--- On Wed, 5/26/10, Dave <sn...@gmail.com> wrote:

From: Dave <sn...@gmail.com>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Wednesday, May 26, 2010, 7:59 AM

On Wed, May 26, 2010 at 12:11 AM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    I took a look into it and I found another place that has very intensive database queries.
>
>    RequestMappingFilter.doFilter() --> WeblogRequestMapper.handleRequest().
>
>   RequestMapingFilter's URL mapping is /*, so it check every http request.
>
>   WeblogRequestMapper.handleRequest() verifies ALL requests, I mean, including those css, js and image files with named JPA queries.
>
>
>   Actually,  both PageServlet and RequestMappingFilter query weblog with handle.  It looks like database is used as hashtable in these two functions.   While database is usually used for account data transaction, relational data management.
>
>   Now for each web page request there are at least 'eleven' database queries, one for the text/html content in PageServelt and ten requests in mapping filter for everything including the text/html.
>
>   I feel that there could be even more database wires.  Since many people work on Roller and everyone tends to add some more wires.
>
>    It seems that there should be a top-down design solution for this issue.
>
>     Like to hear something from you.

Hi David,

You are correct, WeblogRequestMapper is invoked on every request, but
does nothing when it encounters URLs that begin with these patterns:

   rendering.weblogMapper.rollerProtectedUrls=\
   roller-ui,images,theme,themes,CommentAuthenticatorServlet,\
   index.jsp,favicon.ico,robots.txt,\
   page,flavor,rss,atom,language,search,comments,rsd,resource,xmlrpc,planetrss

It ignores static theme resources (images, CSS, JS, etc.) and
everything else that is not dynamically generated by a weblog page
template. Perhaps the problem is not quite as bad as you think.

There have not been that many people working on Roller and the ones
that have worked on the code have been pretty disciplined about when
database calls are made. But of course, even disciplined developers
make mistakes. I'm sure there is much room for improvement and I
encourage you to continue your research into performance bottlenecks.

If you have a proposal for a top-down solution, or some patches to
improve things -- I'd be happy to review them or even commit them for
you if they look good.

- Dave

Re: Resend -- About weblog view data access

Posted by Dave <sn...@gmail.com>.

On Thu, May 27, 2010 at 5:59 PM, (David) Ming Xia
<da...@ibol.biz> wrote:
>    The current caching system does not fit the task I described.  Current Roller caches are just local hash maps or hash tables, they are not distributed; It has no synchronization of weblog content, especially the value ‘Last-Modified’ for multiple server threads.   While nowadays most production environments are clustering environment, composed of multiple JVMs and application server runtimes.

That's not completely true. Roller has a pluggable page caching and
you can plugin memcached if you want a distributed cache. Code is
available on roller.dev.java.net for the Roller Memcache plugin --
it's not part of Roller because, I think, there is some LGPL
dependency.

For caching of database results, in the past we have used Hibernate's
L2 cache feature, which can also be backed by memcached for
distributed cache. Roller has since switched to OpenJPA, but OpenJPA
also has a pluggable cache.

I would recommend pursuing OpenJPA L2 cache. It would be better if
Roller does not have to implement object caching but can instead rely
on the persistence engine to do that.

- Dave