You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@cocoon.apache.org by sh...@apache.org on 2002/07/03 22:27:14 UTC
cvs commit: xml-cocoon2/src/documentation/xdocs/howto howto-html-pdf-publishing.xml

shannon     2002/07/03 13:27:14

  Added:       src/documentation/xdocs/howto howto-html-pdf-publishing.xml
  Log:
  New How-To on publishing
  HTML and PDF docs in Cocoon
  by Betrand Delacretaz
  bdelacretaz@codeconsult.ch
  
  Revision  Changes    Path
  1.1                  xml-cocoon2/src/documentation/xdocs/howto/howto-html-pdf-publishing.xml
  
  Index: howto-html-pdf-publishing.xml
  ===================================================================
  <?xml version="1.0" encoding="ISO-8859-1"?>
  <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" "../dtd/document-v10.dtd">
  
  <document>
   <header>
    <title>How to publish XML documents in HTML and PDF</title>
    <authors>
     <person name="Bertrand Delacr&#232;taz" email="bdelacretaz@codeconsult.ch"/>
    </authors>
   </header>
  
   <body>
  
  <s1 title="Overview">
  <p>
  Without requiring any prior knowledge of Cocoon, XSLT or XSL-FO, this How-To shows you how to publish XML 
  documents in HTML and PDF using Cocoon.
  <br/>
  The steps below have been tested with Cocoon 2.0.2-dev but should work with any 2.x version.  
  </p>
  </s1>
  
  <s1 title="Purpose">
  <p>
  We will build a simple pipeline that converts XML documents into HTML or PDF on-the-fly using simple 
  XSLT transforms.
  <br/>
  This is similar to the <em>hello.html</em> and <em>hello.pdf</em> samples of the standard Cocoon installation, but here you
  will be building it yourself, which should help you get a better feel of how this works. 
  </p>
  </s1>
  
  <s1 title="Intended Audience">
  <p>
  Beginning Cocoon users who want to learn how to publish HTML and/or PDF documents from XML data.
  </p>
  </s1>
  
  <s1 title="Prerequisites">
  <p>Here's what you need:</p>  
  
  <ul>
  <li>Cocoon must be running on your system . </li>
  <li>This document assumes a standard installation where
  <link href="http://localhost:8080/cocoon/mount/">http://localhost:8080/cocoon/mount/</link> points to 
  the <em>mount</em> subdirectory of the Cocoon installation. Calling this URL should display a page
  titled "Directory Listing of mount".
  <br/> 
  If your installation runs on a different URL, you will have to adjust
  the URLs given in this document accordingly. 
  </li>
  <li>You must be able to create and edit XML files in the <em>mount</em> subdirectory of the Cocoon installation.
  In a standard installation, this is <em>webapps/cocoon/mount</em> under the directory of the tomcat installation. 
  </li>
  </ul>
  <note>You will not need a fancy XML editor for this, copying and pasting the examples into any text editor
  will do.</note>
  
  </s1>
  
  <s1 title="Steps">
  <p>
  Here's how to proceed.
  </p>
  
  <s2 title="1. Create the work directory under mount" >
  <p>
  Under <em>webapps/cocoon/mount</em>, create a new directory named <em>html-pdf</em>. 
  All files used by this How-To will reside in this directory.
  <br/>
  After a browser refresh, <link href="http://localhost:8080/cocoon/mount/">http://localhost:8080/cocoon/mount/</link> 
  should display the name of this new directory, among others. 
  </p>
  </s2>
  
  <s2 title="2. Create the XML example documents" >
  <p>
  To keep it simple we will use two small XML files as our data source.
  Later, you will probably use other data sources like live XML feeds, databases, etc. 
  </p>
  <p>
  In the <em>html-pdf</em> directory, create the following two files, naming them exactly as
  shown.
  </p>
  
  <note>
  Be careful about lower/uppercase in filenames if you're working on a unix or linux system. 
  On such systems, <em>thisFile.xml</em> is not the same as <em>Thisfile.xml</em>.
  </note>
  <note>
  To avoid any errors, use copy/paste when creating XML documents from examples on this page.
  <br/>
  Also, do not leave spaces at the start of XML files - the &lt;?xml... processing instruction must
  be the first character in the file.
  </note>
  
  <p>
  Contents of file <strong>pageOne.xml</strong>:
  </p>
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <page>
  <title>This is the pageOne.xml example</title>
  <s1 title="Section one">
      <p>This is the text of section one</p>
  </s1>
  </page>
          ]]></source>
  
  <p>
  Contents of file <strong>pageTwo.xml</strong>:
  </p>
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <page>
  <title>This is the pageTwo.xml example</title>
  <s1 title="Yes, it works">
      <p>Now you're hopefully seeing pageTwo in HTML or PDF</p>
  </s1>
  </page>
          ]]></source>
  
  </s2>
  
  <s2 title="3. Create the XSLT transform for HTML" >
  <p>
  The most common way of producing HTML in Cocoon is to use <em>XSLT transforms</em> to select and convert 
  the appropriate elements of the input documents.
  </p>
  
  <p>
  Copy the file shown below to the <em>html-pdf</em> directory alongside your XML documents, naming it
  <strong>doc2html.xsl</strong>
  </p>
  
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  
  <!-- generate HTML skeleton on root element -->
  <xsl:template match="/">
    <html>
      <head>
        <title><xsl:apply-templates select="page/title"/></title>
      </head>
      <body>
          <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
  
  <!-- story is used later by the Meerkat example -->
  <xsl:template match="p|story">
      <p><xsl:apply-templates/></p>
  </xsl:template>
  
  <!-- convert sections to HTML headings -->
  <xsl:template match="s1">
      <h1><xsl:apply-templates select="@title"/></h1>
      <xsl:apply-templates/>
  </xsl:template>
  
  </xsl:stylesheet>     
  ]]></source>
  <note>       
  Basically what this does is generate an HTML skeleton and convert the input markup to HTML. We won't go
  into details here, our goal is just to show you how the components of the publishing chain are combined.  
  </note>
  
  </s2>
  
  <s2 title="4. Create the sitemap" >
  <p>
  We now have documents to publish, and an XSLT transform to convert them to our HTML output format.
  What's left is to connect these together when a request is made to Cocoon - that's the role of the <em>sitemap</em>,
  which will select a <em>processing pipeline</em> based on the request received from the browser. 
  </p>
  
  <p>
  To tell Cocoon how we want it to process requests made to <em>html-pdf</em>, 
  copy the following contents to a file named <strong>sitemap.xmap</strong> in the 
  <em>html-pdf</em> subdirectory.
  </p>
  
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0">
  
      <!-- use the standard components -->
      <map:components>
          <map:generators default="file"/>
          <map:transformers default="xslt"/>
          <map:readers default="resource"/>
          <map:serializers default="html"/>
          <map:selectors default="browser"/>
          <map:matchers default="wildcard"/>
          <map:transformers default="xslt"/>
      </map:components>
        
      <map:pipelines>
          <map:pipeline>
              <!-- respond to *.html requests with our docs processed by doc2html.xsl -->
              <map:match pattern="*.html">
                  <map:generate src="{1}.xml"/>
                  <map:transform src="doc2html.xsl"/>
                  <map:serialize type="html"/>
              </map:match>
              
              <!-- later, respond to *.pdf requests with our docs processed by doc2pdf.xsl -->
              <map:match pattern="*.pdf">
                  <map:generate src="{1}.xml"/>
                  <map:transform src="doc2pdf.xsl"/>
                  <map:serialize type="fo2pdf"/>
              </map:match>
          </map:pipeline>
      </map:pipelines>
  </map:sitemap>
          ]]></source>
          
  <note>The important thing here is the first <strong>map:match</strong> element, which tells Cocoon how to process
  requests ending in *.html in this directory. Again, we won't go into details here but that's where it happens.
  </note>
  <note>The above sitemap is already configured for PDF publishing, but this is not usable at this time as we haven't created
  the required XSLT transform yet.</note> 
         
  </s2>
  
  <s2 title="5. Test the HTML publishing" >
  <p>
  At this point you should be able to display the results in HTML: 
  </p>
  <ul>
  <li>
  <link href="http://localhost:8080/cocoon/mount/html-pdf/pageOne.html">http://localhost:8080/cocoon/mount/html-pdf/pageOne.html</link>
  should display the first page with "Section one" in big letters.
  </li>
  <li>
  <link href="http://localhost:8080/cocoon/mount/html-pdf/pageTwo.html">http://localhost:8080/cocoon/mount/html-pdf/pageTwo.html</link>
  should display the second page with "Yes it works" in big letters.
  </li>
  </ul>
  <note>If this doesn't work, you might want to first doublecheck the above steps, and then look at the Cocoon
  logs in the webapps/cocoon/WEB-INF/logs directory. You will find lots of information there: look for clues 
  in files that change in size when the error happens.
  </note>
  </s2>
  
  
  <s2 title="6. Create the XSLT transform for PDF" >
  <p>
  PDF documents are created via XSL-FO documents, which are XML documents that use a specific page-description
  vocabulary (see <link href="#references">References</link> below for more info). The actual conversion to PDF is done by the 
  <em>PdfSerializer</em> which uses software from <link href="http://xml.apache.org/fop">FOP</link>, another Apache
  Software Foundation project.   
  </p>
  
  <p>
  To activate the PDF conversion, copy the file shown below to the <em>html-pdf</em> directory alongside your XML documents, naming it
  <strong>doc2pdf.xsl</strong>
  </p>
  
         <source><![CDATA[
  <?xml version="1.0" encoding="iso-8859-1"?>
  <xsl:stylesheet 
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
      xmlns:fo="http://www.w3.org/1999/XSL/Format"
  >
      <!-- generate PDF page structure -->
      <xsl:template match="/">
          <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
              <fo:layout-master-set>
                  <fo:simple-page-master master-name="page"
                    page-height="29.7cm" 
                    page-width="21cm"
                    margin-top="1cm" 
                    margin-bottom="2cm" 
                    margin-left="2.5cm" 
                    margin-right="2.5cm"
                  >
                      <fo:region-before extent="3cm"/>
                      <fo:region-body margin-top="3cm"/>
                      <fo:region-after extent="1.5cm"/>
                  </fo:simple-page-master>
  
                  <fo:page-sequence-master master-name="all">
                      <fo:repeatable-page-master-alternatives>
                          <fo:conditional-page-master-reference master-reference="page" page-position="first"/>
                      </fo:repeatable-page-master-alternatives>
                  </fo:page-sequence-master>
              </fo:layout-master-set>
  
              <fo:page-sequence master-reference="all">
                  <fo:flow flow-name="xsl-region-body">
                      <fo:block><xsl:apply-templates/></fo:block>
                  </fo:flow>
              </fo:page-sequence>
          </fo:root>
      </xsl:template>
  
      <!-- process paragraphs -->
      <xsl:template match="p">
          <fo:block><xsl:apply-templates/></fo:block>
      </xsl:template>
  
      <!-- convert sections to XSL-FO headings -->
      <xsl:template match="s1">
          <fo:block font-size="24pt" color="red" font-weight="bold">
              <xsl:apply-templates select="@title"/>
          </fo:block>
          <xsl:apply-templates/>
      </xsl:template>
  
  </xsl:stylesheet>
  ]]>
         </source>
  <note>This file is already referenced by the sitemap that we created, so no additional configuration is needed.</note>       
  </s2>
  
  <s2 title="5. Test the PDF publishing" >
  <p>
  At this point you should be able to display the results in PDF in addition to the existing HTML versions: 
  </p>
  <ul>
  <li>
  <link href="http://localhost:8080/cocoon/mount/html-pdf/pageOne.pdf">http://localhost:8080/cocoon/mount/html-pdf/pageOne.pdf</link>
  should display the first page with "Section one" in big red letters.
  </li>
  <li>
  <link href="http://localhost:8080/cocoon/mount/html-pdf/pageTwo.pdf">http://localhost:8080/cocoon/mount/html-pdf/pageTwo.pdf</link>
  should display the second page with "Yes it works" in big red letters.
  </li>
  </ul>
  </s2>
  
  </s1>
  
  <s1 title="Summary">
  <p>
  Hopefully you're beginning to see that this is not too complicated once you know what goes where. 
  <br/>
  The nice thing is that all of our huge corpus
  of XML documents (two documents actually, but that's a start..) is processed by just two XSLT files, one
  for each target format.
  <br/> 
  Changing the appearance of the published documents would require changing these XSLT transforms only, without
  touching the source documents.
  </p>
  </s1>
  
  <s1 title="Tips">
  <s2 title="Tip 1: Dynamic XML data">
  <p>
  Using dynamic XML as the data source is very easy as the Cocoon FileGenerator can read URLs as well. 
  <br/>
  If you add the map:match element shown in bold below <strong>before</strong> the existing map:match elements in your sitemap.xmap file, requesting
  <link href="http://localhost:8080/cocoon/mount/html-pdf/meerkat.html">http://localhost:8080/cocoon/mount/html-pdf/meerkat.html</link>
  should display real-time news from Meerkat (assuming an Internet connection to Meerkat is available).
  <br/>
  The news will be displayed in a very rough format, but this can be made better by writing a 
  specific XSLT transform for this Meerkat data and using it instead of doc2html.xsl in the meerkat.html pipeline.  
  </p>
  
  <source>
  <![CDATA[
  ...
  <map:pipeline>
  ]]>
  <strong>
  <![CDATA[
  <map:match pattern="meerkat.html">
      <map:generate src="http://www.oreillynet.com/meerkat/?_fl=xml"/>
      <map:transform src="doc2html.xsl"/>
      <map:serialize type="html"/>
  </map:match>
  ]]>
  </strong>
  <![CDATA[
  <map:match pattern="*.html">
  etc...
  ]]>
  </source>
  </s2>
  
  <s2 title="Tip 2: Two-step conversion">
  <p>
  When you are generating multiple formats from a single data source, it is often a good idea to first generate
  an intermediate <em>logical document</em> that describes the output in a format-neutral way.
  <br/>
  This is obviously not needed in our simple example, but if you're aiming at more complicated 
  publishing tasks you might want to read about this "publishing pattern" in Martin Fowler's 
  <link href="http://www.martinfowler.com/isa/htmlRenderer.html">Two Step View</link>
  article.
  </p>
  </s2>
  
  </s1>
  
  <s1 title="References">
  <anchor id="references"/>
  <p>
  To go further, you will need to learn about the following technologies and tools:
  </p>
  <ul>
  <li>
  Learning about the 
  <link href="http://www.google.com/search?as_sitesearch=xml.apache.org&amp;as_q=cocoon+concepts+sitemap">
  Cocoon concepts</link> will help you understand how the sitemap, generators, transformers and serializers work.
  </li> 
  <li>
  Learning about <link href="http://www.w3.org/Style/XSL/">XSLT</link> will allow you to write your own transforms to 
  generate HTML, PDF or other formats from XML data. 
  Information about XSL-FO is available at the same address.  
  </li>
  </ul>
  </s1>
  
  <s1 title="Comments">
  <p>
  Care to comment on this How-To? Got another tip? 
  Help keep this How-To relevant by passing along any useful feedback to the author,
  <link href="mailto:bdelacretaz@codeconsult.ch">Bertrand&#160;Delacr&#232;taz</link>.
  </p>
  </s1>
  
  </body>
  </document>
  
  
  

----------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          cocoon-cvs-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-cvs-help@xml.apache.org