You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@cocoon.apache.org by sh...@apache.org on 2002/06/27 17:52:50 UTC

cvs commit: xml-cocoon2/src/documentation/xdocs/howto howto-paginator-transformer.xml

shannon     2002/06/27 08:52:50

  Added:       src/documentation/xdocs/howto
                        howto-paginator-transformer.xml
  Log:
  New How-To for the Paginator
  Transformer based on content
  originally posted by Stefano
  to cocoon-dev on 06/06/2002.
  I expanded it a bit to make it
  more practically useful for newbies.
  There remain issues with the
  samples also created, so this
  is a low-profile commit. I.e.
  please don't announce its
  availability. I need some
  feedback from Stefano before
  announcing...
  
  Revision  Changes    Path
  1.1                  xml-cocoon2/src/documentation/xdocs/howto/howto-paginator-transformer.xml
  
  Index: howto-paginator-transformer.xml
  ===================================================================
  <?xml version='1.0' encoding='ISO-8859-1'?>
  <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN"
  "../dtd/document-v10.dtd">
  <document>
  	<header>
  		<title>How to use the Paginator Transformer</title> 
  		<authors>
  			<person name="Stefano Mazzocchi" email="stefano@apache.org" />
  		</authors>
  	</header>
  <body>
  	<s1 title="Overview">
    <p>
  This How-To describes the how to use Cocoon's Paginator Transformer component. You can consider it a 'FilterTransformer' on pagination steroids. The Paginator Transformer filters specific data and counts pages as it transforms SAX events. It implements pagination rules based on easy-to-configure pagesheet documents.
    </p>
  	</s1>
  	<s1 title="Purpose">
    <p>
  XSLT-based approaches to pagination are problematic. First of all, it's somewhat complex to define the necessary declarative logic in XSLT. Additionally, an XSLT solution is rarely reusable across different pagination use cases. These problems spurred the creation of the Paginator Transformer. You can quickly add pagination capabilities to your webapp once you have configured a simple few rules within a single configuration file, the pagesheet.
    </p>
    <p>
  The Paginator Transformer works quite nicely for use cases involving a few tens of pages and, of course, for static generation of any number of pages. However, the Transformer must process an entire file before it can extract even a single page. Therefore, you are <strong>strongly</strong> advised against using it for books or other large documents on dynamic sites. Nevertheless, its output is cacheable. Thus, if the same page is requested, then the document will be reprocessed by the Transformer only when it has changed.
    </p>
  	</s1>
  	<s1 title="Intended Audience">
    <p>
  Cocoon users who need pagination capabilities for their web documents. This includes frustrated users who are tired of implementing complex, XSLT-based approaches to pagination.
    </p>
  	</s1>
  	<s1 title="Prerequisites">
    <p>
  Make sure you have the version 2.0.3 or greater of Cocoon. The PaginatorTransformer component source is located in the scratchpad area. Therefore, you need to use the following command to build a deployable cocoon.war which includes the scratchpad libraries.
    </p>
  		<source>
  ./build.sh -Dinclude.webapp.libs=yes -Dinclude.scratchpad.libs=yes webapp 
  		</source>
    <p>
  During the build process, the necessary configuration details for the PaginatorTransformer component are automatically copied to cocoon.xconf of cocoon.war. This means that you don't need to manually configure cocoon.xconf. However, if you are adding the paginator samples to Cocoon webapp that was <strong>not</strong> generated by the above build command, add the following snippet to your cocoon.xconf file, located in the WEB-INF directory of your deployed webapp.
    </p>
      <source><![CDATA[
    <component 
       class="org.apache.cocoon.transformation.pagination.Paginator"
       role="org.apache.cocoon.transformation.pagination.Paginator"
     />]]></source>
    <p>
  Sample files related to this How-To are also copied during the build process to Cocoon webapp at webapp/samples/mount/paignator.
    </p>
  	</s1>
  	<s1 title="Steps">
    <p>
  Let's start with a simple example.
    </p>
  		<s2 title="Simple Example">
  	  <p>
  Suppose you have an XML file, document.xml, as follows. 
  	  </p>
  	  
      <source><![CDATA[
  <?xml version="1.0"?>
  <images>
      <image />
      <image />
      <image />
      <image />
      <image />
      <image />
      <image />
  </images>
      ]]></source>
  			<fixme author="DS">
  Perhaps we need a DTD for the pagesheet to help readers visualize the pagesheet.
  			</fixme>
    <p>
  First, you need to write a <strong>pagesheet.</strong> Just as a stylesheet contains instructions for an xslt processor, a pagesheet contains instructions for the paginator filter. Here is the pagesheet dtd.
    </p>
    
      <source><![CDATA[
  <!ELEMENT pagesheet (items?, rules)*>
  <!ATTLIST pagesheet xmlns CDATA #IMPLIED>
  
  <!ELEMENT items (group)>
  
  <!ELEMENT group EMPTY >
  <!ATTLIST group 
     name CDATA #IMPLIED 
     element CDATA #IMPLIED >
  
  <!ELEMENT rules (link?, count?)*>
  
  <!ELEMENT count EMPTY >
  <!ATTLIST count 
     type ( element | char ) #REQUIRED 
     num CDATA #REQUIRED 
     name CDATA #IMPLIED 
     namespace CDATA #IMPLIED 
   >    
      ]]></source>
  
    <p>
  Let's say you want to paginate document.xml content based on a rule of three &lt;image&gt; elements per page. Here's a sample pagesheet, images.xml, which does just that.
    </p>
  				
      <source><![CDATA[
  <?xml version="1.0"?>
  <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0">
  	<rules>
  		<count type="element" name="image" num="3" />
  	</rules>
  </pagesheet>
      ]]></source>
  
  	  <p>
  You process a source file through a pagesheet filter in a sitemap snippet like this:
  	  </p>
  	  
      <source><![CDATA[
  <map:match pattern="page(*)">
    <map:generate src="document.xml"/>
    <map:transform src="pagesheets/images.xml" type="paginator">
  	<map:parameter name="page" value="{1}"/>
    </map:transform>
    <map:serialize type="xml"/>
  </map:match>
      ]]></source>
      
  	  <p>
  Accessing the URI for page one, page(1) ( e.g. http://localhost:8080/cocoon/mount/paginator/page(1) ) yields:
  	  </p>
  	  
      <source><![CDATA[
  <?xml version="1.0" encoding="UTF-8" ?> 
  <images xmlns:page="http://apache.org/cocoon/paginate/1.0">
    <image /> 
    <image /> 
    <image /> 
    <page:page 
       current="1" 
       total="3" 
       current-uri="/cocoon/mount/paginator/page(1)" 
       clean-uri="/cocoon/mount/paginator/page" /> 
  </images>
  
  
      ]]></source>
  
  	  <p>
  Clearly the above XML could have been transformed into something more meaningful. Note that the transformer must process all pages to obtain the value of <code>total</code>. Currently, there is no way to avoid this.
  	  </p>
  			</s2>
  			
  			<s2 title="Adding Navigation">
  		  <p>
  Given the Paginator's a full-blown pagesheet language, there's even more we can accomplish, most importantly, navigation.
  		  </p>
  		  <p>
  As an example, consider the following pagesheet, images2.xml.
  		  </p>
  		  
      <source><![CDATA[
  <?xml version="1.0"?>
  <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0">
  <items>
    <group name="item" element="images" />
   </items> 
   <rules>
      <rules>
        <count type="element" name="image" num="3"/>
        <link type="unit" num="1"/>
      </rules>
   </rules>
  </pagesheet>
  
      ]]></source>
      
    <p>
  The pagesheet rules demonstrate that the transformer understands how the page was encoded in the given URI request, i.e., that parentheses surround the value of page. They also reveal that the transformer can provide navigation links to available pages, in this case, plus or minus one position.
    </p>
  	<fixme author="DS">
  In the above paragraph, you say the transformer understand how the page was encoded. How? I don't see the evidence until the snippet produced below.
  	</fixme>
    <p>
  In your sitemap.xmap file, if you change the pagesheet source to images2.xml as follows:
    </p>
  		  
      <source><![CDATA[
  <map:match pattern="page(*)">
  	<map:generate src="document.xml" />
  	<map:transform src="pagesheets/images2.xml" type="paginator">
  		<map:parameter name="page" value="{1}" />
  	</map:transform>
  	<map:serialize type="xml" />
  </map:match>
      ]]></source>
      
    <p>
  Processing the same page(1) request yields the following (pretty-printed for this document):
    </p>
    
      <source><![CDATA[
  <?xml version="1.0"?>
  <images xmlns:page="http://apache.org/cocoon/paginate/1.0">
    <image /> 
    <image /> 
    <image /> 
    <page:page 
      current="1" 
      total="3" 
      current-uri="/cocoon/mount/paginator/page(1)" 
      clean-uri="/cocoon/mount/paginator/page">
    <page:link 
       type="next" 
       uri="/cocoon/mount/paginator/page(2)" 
       page="2" /> 
    </page:page>
  </images>
  
      ]]></source>
  
    <p>
  This result demonstrates:
    </p>
  <ul>
  	<li>
  Page 0 does not exist, so no &lt;page:link&gt; is created for a previous page.
  	</li>
  	<li>
  Page 2 exists, so &lt;page:link&gt; is created, along with 
  		<ul>
  		  <li>
  a value of "next" for its type attribute (useful for visualization), and
  		  </li>
  		  <li>
  a value of page(2) for its URI attribute (useful for linking without XSLT-specific logic)
  		  </li>
  		</ul>
  		</li>
  		</ul>
    <p>
  Note that the URI is re-encoded using the same parentheses pattern, page(2).
    </p>
    <p>
  Now, without changing anything, requesting page(2) yields the following.
    </p>
    
      <source><![CDATA[
  <?xml version="1.0"?>
  <images xmlns:page="http://apache.org/cocoon/paginate/1.0">
    <image /> 
    <image /> 
    <image /> 
    <page:page 
        current="2" 
        total="3" 
        current-uri="/cocoon/mount/paginator/page(2)" 
        clean-uri="/cocoon/mount/paginator/page">
      <page:link type="prev" uri="/cocoon/mount/paginator/page(1)" page="1" /> 
      <page:link type="next" uri="/cocoon/mount/paginator/page(3)" page="3" /> 
    </page:page>
  </images>
      ]]></source>
      
    <p>
  And requesting page(3) yields the following.
    </p>
    
      <source><![CDATA[
  <?xml version="1.0"?>
  <images xmlns:page="http://apache.org/cocoon/paginate/1.0">
    <image /> 
    <page:page 
       current="3" 
       total="3" 
       current-uri="/cocoon/mount/paginator/page(3)" 
       clean-uri="/cocoon/mount/paginator/page">
    <page:link 
       type="prev" 
       uri="/cocoon/mount/paginator/page(2)" 
       page="2" /> 
    </page:page>
  </images>]]></source>
  
    <p>
  Note only one &lt;image&gt;. The original document, images.xml, only contained seven &lt;image&gt; elements: three for page one, three for page two, but only one for page three. Thus, the result here is the modulo (or remainder) of the division.
    </p>
  	</s2>
  	
  	<s2 title="Real-Life Examples">
    <p>
  Here are a few pagesheets examples which are a bit more complex.
    </p>
  		<s3 title="DirectoryGenerator Pagination">
    <p>
  Here's an example of paginating the contents of a directory using the DirectoryGenerator.
    </p>
    
      <source><![CDATA[
  <?xml version="1.0"?>
  <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0">
  	<rules>
  		<count type="element" name="file" 
  		  namespace="http://apache.org/cocoon/directory/2.0" 
  		  num="16" />
  		<link type="unit" num="2" />
  		<link type="range" value="5" />
  	</rules>
  </pagesheet>
      ]]></source>
  		
    <p>
  The rules state:
    </p>
  	<ol>
  		<li>
  paginate 16 files per page
  		</li>
  		<li>
  provide links to +/- 1 and +/- 2 pages (when available)
  		</li>
  		<li>
  provide links to +/- 5 (when available)
  		</li>
  	</ol>
  	  <p>
  So, suppose we have a directory with 300 files. If we request page 10, the generated page will be:
  	  </p>
  	  
      <source><![CDATA[
  <?xml version="1.0"?>
  <dir:directory>
  	<dir:file ... />
  		[other 15 dir:file] 
  	<page:page 
  	     xmlns:page="http://apache.org/cocoon/paginate/1.0" 
  	     current="10" 
  	     total="19" 
  	     current-uri="dir(10)" 
  	     clean-uri="dir" >
  	  <page:range-link page="5" type="prev" uri="page(5)" />
  	  <page:link page="8" type="prev" uri="page(8)" />
  	  <page:link page="9" type="prev" uri="page(9)" />
  	  <page:link page="11" type="next" uri="page(11)" />
  	  <page:link page="12" type="next" uri="page(12)" />
  	  <page:range-link page="15" type="next" uri="page(15)" />
  	</page:page>
  </dir:directory>
      ]]></source>
      
  			</s3>
  			<s3 title="Asymmetric pagination">
    <p>
  We also have the ability to indicate different rules for each page, for example:
    </p>
      <source><![CDATA[
  <?xml version="1.0"?>
  <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0">
  	<rules page="1">
  		<count type="element" name="b" num="5" />
  		<link type="unit" num="1" />
  	</rules>
  	<rules>
  		<count type="element" name="b" num="10" />
  		<link type="unit" num="2" />
  	</rules>
  </pagesheet>
      ]]></source>
  
  	</s3>
  	<s3 title="Count types">
    <p>
  The Paginator Transformer was designed to count. However, it's up to you to define what needs to be counted, either XML elements or characters (not yet implemented). By supplying values to the attributes of &lt;count&gt; in the pagesheet, you can specify exactly what needs to be counted.
    </p>
    
    <p>
  The &lt;count&gt; element has two required and two optional attributes. The required attributes are:
    </p>
    <ul>
       <li>
  <strong>type</strong> the method of counting the paginator should perform, either elements or characters. 
  When element is specified, the transformer counts startElement() SAX events. When chars is specified (currently not implemented), the transformer counts the primitive data type char.
       </li>
       <li>
  <strong>num</strong> a number which how many times counted item (element or chars) must be present within the transformed page.
       </li>
     </ul>
    <p>
  Optional attributes (when type="element" is specified) are:
    </p>
    <ul>
        <li>
  <strong>name</strong> the name of the element, without any namespace prefix
        </li>
        <li>
  <strong>namespace</strong> the URI of the namespace. If not specified, the default namespace is used.
        </li>
     </ul>
  		</s3>
  	 </s2>
    </s1>
    
    <s1 title="Improving the Paginator Transformer" >
  
    <p>
  The PaginatorTransformer was developed, initially, to paginate a directory listing. It 
  works great when it paginates by counting elements, particularly elements which contain similar amounts of content to be displayed on pages. With documents, for example, it could paginate by counting sections or subsections. However, bear in mind that this approach does not always guarantee visually-balanced web pages.
    </p>
    
  <s2 title="Nested Pagination" >
  
    <p>
  Furthermore, simply counting elements is not always simple. Consider the following:
    </p>
    
      <source><![CDATA[
  <?xml version="1.0"?>
   <a>
    <b>
     <a>
      <b>
       <a>
        <b/>
       </a>
      </b>
     </a>
    </b>
   </a>
      ]]></source>
      
    <p>
  Let's say you want to paginate using one &lt;b&gt; per page. What should the transformed pages look like? Here's a few possible outcomes. Which one is the best? 
    </p>
    
    <s3 title="Page 1" >
      <source><![CDATA[
  <?xml version="1.0"?>
   <a>
    <b>
     <a>
      <a/>
     </a>
    </b>
   </a>
      ]]></source>
    </s3>
    
    <s3 title="Page 2" >
      <source><![CDATA[
  <?xml version="1.0"?>
   <a>
    <a>
     <b>
      <a/>
     </b>
    </a>
   </a>
      ]]></source>
    </s3>
    
    <s3 title="Page 3" >
      <source><![CDATA[
  <?xml version="1.0"?>
   <a>
    <a>
     <a>
      <b/>
     </a>
    </a>
   </a>
      ]]></source>
    </s3>
    
     <p>
  It appears the current code is buggy somewhere. With deep
  nesting as in this example, some SAX events are lost. This 
  creates a non-well-formed SAX stream which chokes subsequent
  transformers, such as XSLT, which may be sensitive to well-formedness.
    </p>
    
    <p>
  Does the above might look like a mental exercise to you? Perhaps, but consider 
  the structure of Cocoon Project's Document DTD 1.1. which includes nested &lt;section&gt; elements. Similar problems will emerge when paginating these documents based on this dtd. 
  It's isn't clear whether the solution adopted above is meaningful or not for a real-world  pagination. Suggestions on this are welcome.
    </p>
    
    </s2>
    
  <s2 title="Character-based Pagination" >
  
    <p>
  Given the need to visually balance pages, a counting method for characters was added, even though it isn't implemented yet. Counting by characters is especially difficult when you think about the algorithms that perform chunking. 
    </p>
    
    <p>
  Assume you have a document like this:
    </p>
    
      <source><![CDATA[
   <p>this is some <strong>text</strong> that happens 
   to be <em>chuncked</em></p>
               ^
      ]]></source>
      
    <p>
  Suppose that paginating by counting the chars results in a chunking point
  indicated by the caret above (between the letters u and n). 
  Ending a page at that position results in XML that is not well-formed as well as
  truncated words. Even if you find a way to provide well-formed XML, 
  you still must deal with word-break issues. Therefore, we need a way to 
  produce well-formed XML by continuing until the first 'block-level' element is encountered, for example, 
  &lt;p&gt; in this case. However, this means that the pagesheet must contain a list of
  such 'block-delimiting' elements. Currently, the Pagesheet parser and
  object model does <strong>not</strong> support this notion.
    </p>
    
    <p>
  Conclusion? Pagination at the char level is not trivial and will require a
  little bit of additional work on the transformer.
    </p>
    
    </s2>
    
  <s2 title="Other Improvements" >
  
    <p>
  One possible way to improve the concept is to count by XPath results. For example,
  you may want to count &lt;section&gt; elements included in other &lt;section&gt; elements. Another way to improve the design is to allow booleans to be used within counting
  rules. For example, you could count &lt;session&gt; AND &lt;chapter&gt; elements. Most likely, XPath will help here as well.
    </p>
    </s2>
    
    </s1>
    <s1 title="Comments">
  <p>
  Care to comment on this How-To? Got ideas on how to improve the Paginator Transformer? Help keep this How-To and the Paginator Transformer relevant by passing along any useful feedback to the author, <link href="mailto:stefano@apache.org">Stefano Mazzocchi</link>.
    </p>
    </s1>
    <s1 title="Revisions">
    <p>
  06-06-02: Content originally posted to cocoon-dev by Stefano Mazzocchi. 
    </p>
    <p>
  06-26-02: Edited and structured by Diana Shannon. Scratchpad samples also added.
    </p>
    </s1>
    </body>
  </document>
  
  
  

----------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          cocoon-cvs-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-cvs-help@xml.apache.org