You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Craig Ringer <cr...@postnewspapers.com.au> on 2010/07/05 06:37:30 UTC

Re: Distributing vertical space in a column while repeating column headings

> The short version is that I can't figure out how to distribute vertical
> space to avoid ragged column bottoms in multi-column pages when the flow
> contains several long-ish one-column tables. Why one-column tables?
> Because I have sections with headings that must be repeated at the top
> of a column if split across columns.
>   


A quick follow-up on this: I ended up solving it using my existing
one-column table approach to repeat headings, then post-processing the
area tree to re-distribute the space. I'm most of the way through
implementing insertion of house ads as whitepace filler as part of area
tree post-processing, too.

I've posted the core of the space redistribution code below in case it
helps anyone else. The rest of the code (not shown) is just the usual
stuff to embed fop, generate the area tree to a tempfile, and then
render from a dom to pdf after reprocessing.

Sorry for the mixed HTML/plain post, but in this case it's the best way
to get Thunderbird to maintain my code formatting. Grr stupid mail clients.

I may post the whole lot later if I get permission from the boss to open
source this whole classified/pagination system, which is likely. For
now, just the bits to reprocess the area tree follow, along with a
cut-down version of the PaginatorConfiguration class that provides the
required factories.

This code finds all blocks with a prod-id starting with "ad_" or
"heading_", determines how much free space is in the column, and
distributes that free space evenly among those blocks, adding to any
existing space-before if found. If it adds space to a block, it adds the
same amount of space to the block progression dimension of all
containing parent elements up to and including the <flow> that contains
the column. To work, it requires that blocks that should receive
distributed space be labeled with a suitable "id" attribute like
"ad_bobsmowing" or "heading_forsale" in the XSL-FO.

My app produces the XSL-FO with some XSLT, from simple input in a format
like the following:


      sample_ad_input.xml



<ads>
  <section class_no="1700">
    <heading>FOR SALE HOUSEHOLD</heading>
    <ad adname="BBQ 4BURNER GAS"><adbody><b>BBQ</b> 4-burner gas, good
cond $80 ONO. 9999 9999.</adbody></ad>
  </section>
  <section class_no="1725">
    <heading>DANCE</heading>
    <ad adname="LATIN AMERICAN ">
      <adbody><b>LATIN</b> American and Social Dancing. Learn all the
popular dances, Cha Cha, Jive, Rumba, Waltz, Quickstep, foxtrot ....
Private and Wedding lessons available.</adbody>
    </ad>
  </section>
</ads>

... but of course your needs would differ. I'm just showing how the
space is redistributed in case others have this problem.

In reality the XML is generated on demand by queries against a
PostgreSQL database containing the ads, but that doesn't matter much for
this purpose.

A cut-down version of the XSLT to transform the above into FO is:


      ads_to_fo.xsl


<?xml version="1.0"?>

<!--
REQUIREMENTS:
 - An XSLST processor
 - Apache FOP
 - Hyphenation files from http://offo.sourceforge.net/hyphenation/index.html
-->

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">

<!-- File extension image files will have. -->
<xsl:param name="imgext"/>

<!--
  The root template produces the XSL-FO
  document structure, including page templates etc.
  It then calls the processor to loop through
  the <ad/> elements and generate content for them.
-->
<xsl:template match="/">

<!-- XSL-FO document structure-->
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" xml:lang="en">
  <!-- Define master pages -->
  <fo:layout-master-set>
    <!-- TODO: separate masters for first page, left pages, right pages -->
    <!-- see http://xmlgraphics.apache.org/fop/fo.html#fo-oddeven -->
    <fo:simple-page-master master-name="page" page-height="400mm"
page-width="290mm" margin="0mm">
      <!-- Body, with columns -->
      <fo:region-body column-count="7" column-gap="0" margin-top="12mm"/>
      <!-- masthead -->
      <fo:region-before extent="10mm"/>
    </fo:simple-page-master>
    <fo:page-sequence-master master-name="pagesequence">
      <!-- If you want a different first page, use
fo:single-page-master-reference here -->
      <fo:repeatable-page-master-reference master-reference="page"/>
    </fo:page-sequence-master>

  </fo:layout-master-set>

  <!-- Define page contents -->
  <fo:page-sequence master-reference="pagesequence" language="en">

    <!-- Ad text -->
    <fo:flow flow-name="xsl-region-body">
        <!-- This should really be a fo:wrapper, but fop isn't bright
enough to cope
             with that right now and will complain about inappropriate
inline areas.
             Use a fo:block container instead until fop svn (which fixes
this)
             is released to replace 0.95 -->
        <fo:block border-left-style="solid" border-right-style="solid"
                border-left-width="0.5pt" border-right-width="0.5pt"
                border-left-color="black" border-right-color="black"
                margin-left="-0.5pt" padding-left="2pt" margin-right="0pt"
                padding-right="2pt">
                <xsl:apply-templates/>
        </fo:block>
    </fo:flow>
  </fo:page-sequence>

</fo:root>
<!-- End XSL-FO document structure-->
</xsl:template>

<!--
Process a classification section, producing a one-column table so we can
ensure
the heading repeats on column breaks.

The table header will be provided by the <heading> element, which must
be the
first element of a <section>. Subsequent <ad> elements will go in the body.
-->
<xsl:template match="section">
    <fo:table table-layout="fixed" width="100%" space-before="4pt"
id="section_{@class_no}">
    <xsl:apply-templates select="heading"/>
    <fo:table-body>
      <fo:table-row>
        <fo:table-cell>
          <xsl:apply-templates select="ad"/>
        </fo:table-cell>
      </fo:table-row>
    </fo:table-body>
  </fo:table>
</xsl:template>

<!--
  Process a heading
-->
<xsl:template match="heading">
  <fo:table-header>
    <fo:table-cell>
            <fo:block hyphenate="false" text-align="center"
background-color="black" color="white" font-family="Helvetica"
font-size="10pt" font-weight="bold" padding-before="3pt"
padding-after="1.5pt" margin-top="0" margin-bottom="2pt"
id="heading_{../@class_no}">
        <xsl:apply-templates/>
      </fo:block>
    </fo:table-cell>
  </fo:table-header>
</xsl:template>

<!--
  Process an ad.
  Additional top-level templates are used to handle formatting,
  so this just encloses it in a block and calls the processor.
-->
<xsl:template match="ad">
      <fo:block hyphenate="true" text-align="justify"
text-align-last="left" widows="4" orphans="4"
                border-top-width="0.2pt" border-top-style="solid"
border-top-color="black"
                padding-after="0.2pt" padding-before="0.3pt"
                font-family="Helvetica" font-weight="regular"
font-size="6.3pt"
                id="ad_{@adname}_class_{../@class_no}"
                >
      <xsl:apply-templates/>
      </fo:block>
</xsl:template>

<xsl:template match="adbody">
    <xsl:apply-templates/>
</xsl:template>

<!-- handle an external ad reference, for when an ad is an image -->
<!-- They need to be centered -->
<xsl:template match="external">
        <!-- Fop takes some persuasion to scale the pics to the right
width. The explicit "width=100%"
             appears to be necessary to get it to scale - scale-to-fit
alone won't do it. -->
        <fo:external-graphic content-width="scale-to-fit" width="100%"
content-height="auto"
                scaling="non-uniform" src="url('pics/{.}{$imgext}')" />
</xsl:template>

<!--
  Convert bold tag to XSL-FO bold inline style block
-->
<xsl:template match="b">
  <fo:inline font-weight="bold">
  <xsl:apply-templates/>
  </fo:inline>
</xsl:template>

</xsl:stylesheet>






Given xml produced by that XSLT, converted to area tree XML by fop and
loaded into a W3C DOM (Document) using the usual Java tools, the
following code will redistribute space in the columns so that the
Document may be passed back into FOP via a DOMSource to be rendered to PDF.


      *AreaTreeTransformer.java:*


import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

/**
 * AreaTreeTransformer is responsible for manipulating a loaded area tree
 * XML DOM to redistribute space, insert house ads, etc.
 *
 * @author Craig Ringer <cr...@postnewspapers.com.au>
 */
public class AreaTreeTransformer {

    private final PaginatorConfiguration conf;
    private final Document areaTree;

    private final XPathFactory xpathFactory;
    private final XPathExpression findColumnsInDocument;
    private final XPathExpression findAdsInColumn;

    /**
     * Prepare a new transformer to operate on the passed area tree XML.
     *
     * @param conf PaginatorConfiguration to provide factories required
     * @param areaTree W3C DOM containing area tree XML to process
     * @throws XPathExpressionException
     */
    public AreaTreeTransformer(PaginatorConfiguration conf, Document
areaTree) throws XPathExpressionException {
        this.conf = conf;
        this.areaTree = areaTree;
        this.xpathFactory = conf.getXPathFactory();

        // This expression locates all columns in the document. It'll be
called
        // with the document root node as an argument.
        findColumnsInDocument =
xpathFactory.newXPath().compile("//span/flow");

        // This expression locates all ad and heading nodes within a column.
        // It'll be called with a column node, as returned by
findColumnsInDocument,
        // as an argument.
        findAdsInColumn =
xpathFactory.newXPath().compile(".//block[starts-with(@prod-id,'ad_') or
starts-with(@prod-id,'heading_')]");
    }

    /**
     * Transformation of the document is done on a column-by-column basis.
     * First, we must find all the columns and iterate over them. Then
within
     * each column, we must find the amount of white space that must be
consumed.
     *
     * Once the white space is known, the decision of what to do with it
must be
     * made. Should it just be re-distributed? Or should a house ad be
inserted?
     * Or (for the final column) should it be left empty for other
content to be
     * put in?
     *
     * Once any house ads are inserted, the remaining white space must
be distributed
     * between all the ads.
     *
     * For some basic info on the xpath api see
     * http://www.ibm.com/developerworks/library/x-javaxpathapi.html
     */
    public void doTransform() throws XPathExpressionException {
        // First we must find the columns. Each column in the area tree
is identifed
        // by a flow element under a span element, so it's easy to find
them.
        NodeList columnList =
(NodeList)findColumnsInDocument.evaluate(areaTree, XPathConstants.NODESET);
        for ( int i = 0; i < columnList.getLength(); i++ ) {
            Element flowNode = (Element)columnList.item(i);
            // For each column, we must determine how much free space is
in the column.
            // This is the difference between the block progression
dimension of the
            // span (ie the max col height) and the block progression
dimension of the
            // flow containing the column its self.
            final int spanBpd =
Integer.parseInt(((Element)flowNode.getParentNode()).getAttribute("bpd"));
            final int flowBpd =
Integer.parseInt(flowNode.getAttribute("bpd"));
            if (flowBpd == 0) {
                // Empty column.
                // TODO: The last column BEFORE the empty column may
need special
                // treatment, so we might need to add lookahead. OTOH,
there's no
                // guarantee there will be any empty cols - the last col
might be
                // on the end of a page.
                continue;
            }
            final double spaceToFill = (double)spanBpd - (double)flowBpd;
           
            // TODO: determine optimal house ad(s) to consume this space
            // and append them to the column, increasing the flow b-p-d as
            // necessary.
            final double spaceToDistribute = addHouseAds(flowNode,
spaceToFill);
            // Now redistribute space within the column so that ads use up
            // all the space. To do this, we find all ads (and headings)
in the
            // column, and then divide the space evenly between all except
            // the first block in the column. We then distribute that
space among
            // all the nodes we found by adding it to each node's
space-before.
            //
            // For each node to which space is added, we must update the
b-p-d
            // of all parent blocks up to the flow level, so that everything
            // starts in the right places and all the children fit
inside their
            // containing parents. The easiest way to do that is walk up the
            // ancestor tree adding to the b-p-d of each node along the way.
            NodeList adsAndHeads = (NodeList)
findAdsInColumn.evaluate(flowNode, XPathConstants.NODESET);
            // Distribute space among all blocks EXCEPT first, which
shouldn't get any
            // because we want it flush with the top margin.
            final int numBlocksToPad = adsAndHeads.getLength() - 1;
            if (numBlocksToPad == 0) {
                // Only one block in this column!
                System.err.println("Cannot distribute space in column:
only one ad block in column");
                continue;
            }
            final double extraSpacePerBlock = spaceToDistribute /
numBlocksToPad;

            // Start padding AFTER first block
            for ( int j = 1 ; j < adsAndHeads.getLength(); j++ ) {
                Element block = (Element) adsAndHeads.item(j);
                padBlock( block, flowNode, extraSpacePerBlock );
            }
        }
    }

    private double addHouseAds(Element columnElement, double spaceToFill) {
        // TODO: use conf object to obtain house ad dimensions,
determine best fit,
        // and insert ads into area tree.
        //
        // Currently no ads added, return original space
        return spaceToFill;
    }

    /*
     * Add `extraSpacePerBlock' to space-before on block, adding the
attribute if
     * it is missing and otherwise increasing its value by the specified
amount.
     *
     * Then scan up the ancestor tree, and for each ancestor with a bpd
attribute
     * (block progression dimension) between the block and the
surrounding flow
     * element, inclusive, increase the bpd of that element by
extraSpacePerBlock.
     */
    private void padBlock(Element block, Element flowNode, double
extraSpacePerBlock) {
        double newSpaceBefore = extraSpacePerBlock;
        if (block.hasAttribute("space-before")) {
            newSpaceBefore +=
Integer.parseInt(block.getAttribute("space-before"));
        }
        String roundedSpaceBefore =
Long.toString(Math.round(newSpaceBefore));
        block.setAttribute("space-before", roundedSpaceBefore);

        Element parent = (Element)block.getParentNode();
        do {
            if (parent.hasAttribute("bpd")) {
                long newBpd = Math.round(extraSpacePerBlock +
Integer.parseInt(parent.getAttribute("bpd")));
                parent.setAttribute("bpd", Long.toString(newBpd));
            }
            if (flowNode.isSameNode(parent))
                break;
        } while ( (parent = (Element)parent.getParentNode()) != null );
    }

}


      PaginatorConfiguration.java


import java.io.File;
import java.net.MalformedURLException;
import java.nio.file.Path;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.xpath.XPathFactory;
import org.apache.fop.apps.FopFactory;

/**
 * PaginatorConfiguration tracks instances of configured factories required
 * for parsing, formatting, etc.
 *
 * @author Craig Ringer <cr...@postnewspapers.com.au>
 */
public class PaginatorConfiguration {

    private final FopFactory fopFactory = FopFactory.newInstance();
    private final TransformerFactory xsltFactory =
TransformerFactory.newInstance();
    private final DocumentBuilderFactory documentBuilderFactory =
DocumentBuilderFactory.newInstance();
    private final XPathFactory xpathFactory = XPathFactory.newInstance();
   
    // TODO: load from resource
    private final File adsToFoXSLTFile = new File("ads_to_fo.xml");

    public PaginatorConfiguration() throws MalformedURLException {
        // Namespace awareness is required if feeding the dom back into fop.
        documentBuilderFactory.setNamespaceAware(true);
        // TODO: configure font base, image base, etc here.
        File cwd = new File( System.getProperty("user.dir") );
        fopFactory.setBaseURL( cwd.toURI().toString() );
        fopFactory.setFontBaseURL( (new
File(cwd,"fonts")).toURI().toString() );
        fopFactory.setSourceResolution(200);
        fopFactory.setTargetResolution(200);
        // TODO: download and paginate house ads
    }

    public FopFactory getFopFactory() {
        return fopFactory;
    }

    public TransformerFactory getTransformerFactory() {
        return xsltFactory;
    }

    public DocumentBuilderFactory getDocBuilderFactory() {
        return documentBuilderFactory;
    }

    public XPathFactory getXPathFactory() {
        return xpathFactory;
    }

    public File getAdsToFoXSLTFile() {
        return adsToFoXSLTFile;
    }

}

-- 
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/