You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Fred Toth <ft...@synernet.com> on 2003/09/09 16:00:54 UTC

Large documents and fragments?

Hi,

We work in the scientific publishing industry and our typical source materials
are fairly large XML files that contain a journal article with all the 
usual stuff,
abstracts, bibliographic references, figures, tables, etc.

One of these documents typically yields multiple individual pages. For example,
we will have an abstract page, a full text page, a figure 1 page, etc. 
Further, we
will aggregate bits of 50 documents or so to produce a table of contents.

I am looking for the best way to approach this with cocoon. It seems 
impractical
to have a single source document drive all of these pages? I'm wondering
if the document should be split up into fragments. How would something like
this be done with cocoon? Can you serialize to a disk file?

Also note that we are likely to be generating HTML off line and not using 
cocoon
for serving pages. But we want to be able to take advantage of sitemaps, 
pipelines
and all the other goodies to get the job done.

This might be a bit outside of the normal cocoon usage. Has anyone else
had any experience with this approach? Am I missing something obvious?
Is there a better way?

Many thanks!

Fred


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


RE: Large documents and fragments?

Posted by Olivier Lange <wi...@petit-atelier.ch>.
> From what I understand so far about cocoon, it seems I would have to parse
> the file 5 or more times, once for each of the output page types. Is there
> a better way?

Cocoon has a powerful caching mecanism built in, so it won't probably need
to parse it each time.

But this depends on your pipelines, all content cannot be cached; having
said this, if you intend to process static XML files, they will be cached
upon parsing.

> I feel like I need a process to make XML fragments for these, then call
them
> individually for processing. Or is that not the cocoon way?

It is the Cocoon way!

There are many ways to split your processing and to reuse components.

Here is one:

<!-- Internal pipeline: produces XML content, does not match URIs -->

<map:pipeline internal-only="true">

  <!-- Generate the full source document -->
  <map:match pattern="single-source(text-full).xml">
    <map:generate type="file" src="single-source.xml"/>
    <map:serialize type="xml"/>
  </map:match>

  <map:match pattern="single-source(text-section*).xml">
    <map:generate type="file" src="cocoon:/single-source(text-full).xml"/>
    <map:transform type="xslt" src="filter-section.xsl">
      <map:parameter name="idrefSection" value="{1}"/>
    </map:transform>
    <map:serialize type="xml"/>
  </map:match>

  <!-- Filter one or more figures from the source document -->
  <map:match pattern="single-source(figure*).xml">
    <map:generate type="file" src="cocoon:/single-source(text-full).xml"/>
    <map:transform type="xslt" src="filter-figure.xsl">
      <map:parameter name="idrefFigure" value="{1}"/>
      <!-- If idrefFigure == "", you could design the transformation to
return all figures -->
    </map:transform>
    <map:serialize type="xml"/>
  </map:match>
  ...
  <!-- the same for abstract, tables, ... -->
  ...
</map:pipeline>

<!-- Driving pipeline, does match URIs -->

<map:pipeline>
  <-- The full text, with figures, tables, abstract, toc, ... -->
  <map:match pattern="full-text.html">
    <map:generate type="file" src="cocoon:/single-source(text-full).xml"/>
    <map:transform type="xslt" src="layout-text-to-xhtml.xsl"/>
    <map:transform type="xslt" src="layout-abstract-to-xhtml.xsl"/>
    <map:transform type="xslt" src="layout-figures-to-xhtml.xsl"/>
    <map:transform type="xslt" src="layout-tables-to-xhtml.xsl"/>
    <map:transform type="xslt" src="layout-toc-to-xhtml.xsl"/>
    <map:serialize type="xhtml"/>
  </map:match>

  <-- A specific figure -->
  <map:match pattern="figure*.html">
    <map:generate type="file" src="cocoon:/single-source(figure{1}).xml"/>
    <map:transform type="xslt" src="layout-figures-to-xhtml.xsl"/>
    <map:serialize type="xhtml"/>
  </map:match>

  ...
  <!-- the same for abstract, tables, ... -->
  ...

  <!-- Custom text assembly -->
  <map:match pattern="custom-text.html">
    <map:aggregate element="text">
      <map:part src="cocoon:/single-source(text-section1).xml"
strip-root="true"/>
      <map:part src="cocoon:/single-source(figure6).xml" strip-root="true"/>
      <map:part src="cocoon:/single-source(text-section9).xml"
strip-root="true"/>
      <map:part src="cocoon:/single-source(table3).xml" strip-root="true"/>
    </map:aggregate>
    <map:transform type="xslt" src="layout-text-to-xhtml.xsl"/>
    <map:transform type="xslt" src="layout-figures-to-xhtml.xsl"/>
    <map:transform type="xslt" src="layout-tables-to-xhtml.xsl"/>
    <map:serialize type="xhtml"/>
  </map:match>

</map:pipeline>

The simplest is to use pipelines and/or matchers. You could also rely on
resources (<map:call resource="...">) and views. I use views for debugging
purposes and alternate layout, and make extensive use of resources and
pipelines.

Hope that helps!

Olivier


-----Message d'origine-----
De : Fred Toth [mailto:ftoth@synernet.com]
Envoye : mercredi, 10. septembre 2003 03:42
A : users@cocoon.apache.org
Objet : RE: Large documents and fragments?


Thanks Olivier,

Yes, that does help. I read up on the command line interface and I see
that it neatly solves the serialize-to-file problem (and others). Thanks
for the tip.

But what about the fragment question? Say I have a single source file
that will generate these pages:

1. full text view
2. abstract only view
3. figure 1 page
4. figure 2 page
5. table 1 page
etc.

 From what I understand so far about cocoon, it seems I would have to parse
the file 5 or more times, once for each of the output page types. Is there
a better
way?

I feel like I need a process to make XML fragments for these, then call them
individually for processing. Or is that not the cocoon way?

Thanks again,

Fred

At 09:09 PM 9/9/03 +0200, you wrote:
> > This might be a bit outside of the normal cocoon usage. Has anyone else
> > had any experience with this approach? Am I missing something obvious?
> > Is there a better way?
>
>Have you seen that Cocoon can be run from the command line? In that case,
it
>produces static files for each matched URI in the sitemap, and Cocoon can
>follow links between the files. The Cocoon documentation is built like
this,
>it was the initial intent of Cocoon. Apache Forrest uses Cocoon to do this
>also, generating static HTML and PDF documents alike. I'm just doing this
to
>generate a website offline.
>
> > I'm wondering if the document should be split up into fragments. How
would
> > something like this be done with cocoon? Can you serialize to a disk
file?
>
>Yes. Files are automatically created for the matched URIs from the
>serialized content if you run Cocoon from the command-line.
>
>The generation process could be splitted between different matchers, each
>one composing some part of the document. You could even shield inner
>processing from matching URIs in so called "internal" pipelines. The
>"external" pipelines would drive the processing, aggregate the content
built
>by internal pipelines, further transform and serialize it.
>
>Is that of any help?
>
>Olivier
>
>
>-----Message d'origine-----
>De : Fred Toth [mailto:ftoth@synernet.com]
>Envoye : mardi, 9. septembre 2003 16:01
>A : users@cocoon.apache.org
>Objet : Large documents and fragments?
>
>
>Hi,
>
>We work in the scientific publishing industry and our typical source
>materials
>are fairly large XML files that contain a journal article with all the
>usual stuff,
>abstracts, bibliographic references, figures, tables, etc.
>
>One of these documents typically yields multiple individual pages. For
>example,
>we will have an abstract page, a full text page, a figure 1 page, etc.
>Further, we
>will aggregate bits of 50 documents or so to produce a table of contents.
>
>I am looking for the best way to approach this with cocoon. It seems
>impractical
>to have a single source document drive all of these pages? I'm wondering
>if the document should be split up into fragments. How would something like
>this be done with cocoon? Can you serialize to a disk file?
>
>Also note that we are likely to be generating HTML off line and not using
>cocoon
>for serving pages. But we want to be able to take advantage of sitemaps,
>pipelines
>and all the other goodies to get the job done.
>
>This might be a bit outside of the normal cocoon usage. Has anyone else
>had any experience with this approach? Am I missing something obvious?
>Is there a better way?
>
>Many thanks!
>
>Fred
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
>For additional commands, e-mail: users-help@cocoon.apache.org
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
>For additional commands, e-mail: users-help@cocoon.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


RE: Large documents and fragments?

Posted by Fred Toth <ft...@synernet.com>.
Thanks Olivier,

Yes, that does help. I read up on the command line interface and I see
that it neatly solves the serialize-to-file problem (and others). Thanks
for the tip.

But what about the fragment question? Say I have a single source file
that will generate these pages:

1. full text view
2. abstract only view
3. figure 1 page
4. figure 2 page
5. table 1 page
etc.

 From what I understand so far about cocoon, it seems I would have to parse
the file 5 or more times, once for each of the output page types. Is there 
a better
way?

I feel like I need a process to make XML fragments for these, then call them
individually for processing. Or is that not the cocoon way?

Thanks again,

Fred

At 09:09 PM 9/9/03 +0200, you wrote:
> > This might be a bit outside of the normal cocoon usage. Has anyone else
> > had any experience with this approach? Am I missing something obvious?
> > Is there a better way?
>
>Have you seen that Cocoon can be run from the command line? In that case, it
>produces static files for each matched URI in the sitemap, and Cocoon can
>follow links between the files. The Cocoon documentation is built like this,
>it was the initial intent of Cocoon. Apache Forrest uses Cocoon to do this
>also, generating static HTML and PDF documents alike. I'm just doing this to
>generate a website offline.
>
> > I'm wondering if the document should be split up into fragments. How would
> > something like this be done with cocoon? Can you serialize to a disk file?
>
>Yes. Files are automatically created for the matched URIs from the
>serialized content if you run Cocoon from the command-line.
>
>The generation process could be splitted between different matchers, each
>one composing some part of the document. You could even shield inner
>processing from matching URIs in so called "internal" pipelines. The
>"external" pipelines would drive the processing, aggregate the content built
>by internal pipelines, further transform and serialize it.
>
>Is that of any help?
>
>Olivier
>
>
>-----Message d'origine-----
>De : Fred Toth [mailto:ftoth@synernet.com]
>Envoye : mardi, 9. septembre 2003 16:01
>A : users@cocoon.apache.org
>Objet : Large documents and fragments?
>
>
>Hi,
>
>We work in the scientific publishing industry and our typical source
>materials
>are fairly large XML files that contain a journal article with all the
>usual stuff,
>abstracts, bibliographic references, figures, tables, etc.
>
>One of these documents typically yields multiple individual pages. For
>example,
>we will have an abstract page, a full text page, a figure 1 page, etc.
>Further, we
>will aggregate bits of 50 documents or so to produce a table of contents.
>
>I am looking for the best way to approach this with cocoon. It seems
>impractical
>to have a single source document drive all of these pages? I'm wondering
>if the document should be split up into fragments. How would something like
>this be done with cocoon? Can you serialize to a disk file?
>
>Also note that we are likely to be generating HTML off line and not using
>cocoon
>for serving pages. But we want to be able to take advantage of sitemaps,
>pipelines
>and all the other goodies to get the job done.
>
>This might be a bit outside of the normal cocoon usage. Has anyone else
>had any experience with this approach? Am I missing something obvious?
>Is there a better way?
>
>Many thanks!
>
>Fred
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
>For additional commands, e-mail: users-help@cocoon.apache.org
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
>For additional commands, e-mail: users-help@cocoon.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


RE: Large documents and fragments?

Posted by Olivier Lange <wi...@petit-atelier.ch>.
> This might be a bit outside of the normal cocoon usage. Has anyone else
> had any experience with this approach? Am I missing something obvious?
> Is there a better way?

Have you seen that Cocoon can be run from the command line? In that case, it
produces static files for each matched URI in the sitemap, and Cocoon can
follow links between the files. The Cocoon documentation is built like this,
it was the initial intent of Cocoon. Apache Forrest uses Cocoon to do this
also, generating static HTML and PDF documents alike. I'm just doing this to
generate a website offline.

> I'm wondering if the document should be split up into fragments. How would
> something like this be done with cocoon? Can you serialize to a disk file?

Yes. Files are automatically created for the matched URIs from the
serialized content if you run Cocoon from the command-line.

The generation process could be splitted between different matchers, each
one composing some part of the document. You could even shield inner
processing from matching URIs in so called "internal" pipelines. The
"external" pipelines would drive the processing, aggregate the content built
by internal pipelines, further transform and serialize it.

Is that of any help?

Olivier


-----Message d'origine-----
De : Fred Toth [mailto:ftoth@synernet.com]
Envoye : mardi, 9. septembre 2003 16:01
A : users@cocoon.apache.org
Objet : Large documents and fragments?


Hi,

We work in the scientific publishing industry and our typical source
materials
are fairly large XML files that contain a journal article with all the
usual stuff,
abstracts, bibliographic references, figures, tables, etc.

One of these documents typically yields multiple individual pages. For
example,
we will have an abstract page, a full text page, a figure 1 page, etc.
Further, we
will aggregate bits of 50 documents or so to produce a table of contents.

I am looking for the best way to approach this with cocoon. It seems
impractical
to have a single source document drive all of these pages? I'm wondering
if the document should be split up into fragments. How would something like
this be done with cocoon? Can you serialize to a disk file?

Also note that we are likely to be generating HTML off line and not using
cocoon
for serving pages. But we want to be able to take advantage of sitemaps,
pipelines
and all the other goodies to get the job done.

This might be a bit outside of the normal cocoon usage. Has anyone else
had any experience with this approach? Am I missing something obvious?
Is there a better way?

Many thanks!

Fred


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org