You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by Jeff Turner <je...@apache.org> on 2003/08/10 11:21:09 UTC
Re: Metadata
Hi Jason,
This stuff sounds great :) I look forward to playing with it.
One thought: how about generalising this extra pipeline, and calling it
'metadata' or 'meta' instead of 'head'?
In your implementation, everything in the **head-* pipeline originates in
the XML <header> tag, and ends up in the HTML <head> tag. Hence naming
the pipeline '**head-*' makes sense. But I think we can generalize this:
- Not all metadata comes from the <header> tag. For instance, we could:
- fetch the page's 'Last Modified' timestamp from the filesystem.
- poke CVS and obtain lots of info about a file from there
- use intelligent software to parse the XML, infer what concepts are
present in the page and automatically generate metadata [1]
- Add a 'Creator' field, specifying the Forrest version used to create
the page.
- Not all metadata is used solely in the HTML <head> tag. I'd like to
put the 'Last Modified' date in the page body, like Maven sites (see
maven.apache.org) do.
So based on this, we could have a '**metadata-*.html' pipeline that
serves up XML conforming to a standard metadata format like Dublin
Core (http://dublincore.org/):
<metadata xmlns="http://apache/org/forrest/metadata/1.0"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title>
Essex Conservatories-Direct : The Local Answer To Your
Conservatory Needs.
</dc:title>
<dc:creator>
Apache Forrest 0.5
</dc:creator>
<dc:description>
Essex, Quality conservatories and sunrooms direct and online. The
Local Answer To Your Conservatory Needs. testing, 1, 2, 3, testing
description
</dc:description>
<dc:publisher>
YourCompany
</dc:publisher>
<dc:identifier>
http://yourcompany.com/index.html
</dc:identifier>
<dc:language>en</dc:language>
<dc:date>created: 2002-10-27; modified: 2002-09-20</dc:date>
</metadata>
There is a list of standard DC elements at
http://dublincore.org/documents/dces/.
--Jeff
[1] See http://directory.google.com/Top/Reference/Knowledge_Management/Knowledge_Retrieval/Classification/Software/?il=1
I have used Klarity (http://archive.klarity.com.au/) before for this.
On Fri, Aug 08, 2003 at 03:38:31PM +0100, g4 wrote:
> Hi Jeff, how's it going?
>
> OK I've been tackling this metadata issue we talked about. Just want to
> make sure I'm heading in the right direction and get some feedback, so
> this is what I've done:
>
> OK so we have this as an example content XML page:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0a//EN"
> "document-v20.dtd">
> <document>
> <header>
> <title>Essex</title>
> <!--<authors>
> <person name="Jeff Turner" email="jefft@apache.org"/>
> </authors>-->
> <meta name="keywords">testing, 1, 2, 3, testing
> keyword</meta>
> <meta name="description">testing, 1, 2, 3, testing
> description</meta>
> </header>
> <body>
> <section>
> <title>to go</title>
> <subtitle>The Local Answer To Your Conservatory
> Needs.</subtitle>
> <tagline>Quality conservatories and sunrooms direct
> and online.</tagline>
> <p>You have successfully generated and rendered an
> <link href="ext:forrest">Apache Forrest</link> site. This page is from the
> site template. It is found in
> <code>my-site/src/documentation/content/xdocs/index.xml</code>
> Please edit it and replace this text with content of
> your own.</p>
> </section>
> </body>
> </document>
>
> 1) so I created a new sitemap.xmap resource called "head"
>
> <map:resource name="head">
> <map:transform src="skins/{forrest:skin}/xslt/html/{type}.xsl">
> <!-- Can set an alternative project skinconfig here
> <map:parameter name="config-file"
> value="../../../../skinconf.xml"/>
> -->
> <map:parameter name="path" value="{path}"/>
> </map:transform>
>
> <map:serialize/>
> </map:resource>
>
> 2) We then have a new pipeline, thus:
>
> <!-- header -->
> <map:match pattern="**head-*.html">
> <map:generate src="cocoon:/{1}{2}.xml"/>
> <map:transform type="linkrewriter"
> src="cocoon:/{1}linkmap-{2}.html"/>
> <map:call resource="head">
> <map:parameter name="type" value="head2html"/>
> <map:parameter name="path" value="{1}{2}.html"/>
> </map:call>
> </map:match>
>
> 3) And then aggregate the whole lot:
>
> <map:part src="cocoon:/head-{0}"/>
>
> 4) I thought that transforming the head separately made a bit more
> sense, my only concern is will it slow things down if we have large
> content files and essentially the content is being parsed twice, no?,
> anyway the XSL for this (head2html):
>
> -->
> <xsl:stylesheet version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>
> <xsl:param name="path"/>
> <xsl:include href="../../../common/xslt/html/dotdots.xsl"/>
> <xsl:include href="../../../common/xslt/html/pathutils.xsl"/>
>
> <xsl:variable name="filename-noext">
> <xsl:call-template name="filename-noext">
> <xsl:with-param name="path" select="$path"/>
> </xsl:call-template>
> </xsl:variable>
>
> <xsl:variable name="root">
> <xsl:call-template name="dotdots">
> <xsl:with-param name="path" select="$path"/>
> </xsl:call-template>
> </xsl:variable>
>
> <xsl:template match="/">
> <head>
> <link rel="stylesheet" href="{$root}skin/main.css"
> type="text/css"/>
> <xsl:apply-templates/>
> </head>
> </xsl:template>
>
> <xsl:template match="header">
> <xsl:apply-templates/>
> </xsl:template>
>
> <xsl:template match="title">
> <title><xsl:value-of select="."/> Conservatories-Direct :
> <xsl:value-of select="//subtitle/."/></title>
> </xsl:template>
>
> <xsl:template match="meta">
> <xsl:if test="@name='description'">
> <meta content="{//title/.}, {//tagline/.}
> {//subtitle/.} {.}" name="{@name}"/>
> </xsl:if>
> <xsl:if test="@name='keywords'">
> <meta content="{//title/.},{.}" name="{@name}"/>
> </xsl:if>
> </xsl:template>
>
> <xsl:template match="body">
> <!-- ignore the <body/> part -->
> </xsl:template>
>
> </xsl:stylesheet>
>
>
> 5) Finally we call the head from within "site2html",
>
> ...
> <xsl:call-template name="head"/>
> ...
> <xsl:template name="head">
>
> <xsl:comment>================= start Metadata items
> ==================</xsl:comment>
> <xsl:apply-templates select="head"/>
> <xsl:comment>================= end Menu items
> ==================</xsl:comment>
>
> </xsl:template>
>
> ....
>
> This produces :
>
> <head>
> <META http-equiv="Content-Type" content="text/html; charset=utf-8">
> <link type="text/css" href="../skin/main.css" rel="stylesheet">
> <title>Essex Conservatories-Direct : The Local Answer To Your
> Conservatory Needs.</title>
> <meta name="keywords" content="Essex,testing, 1, 2, 3, testing keyword">
> <meta name="description" content="Essex, Quality conservatories and
> sunrooms direct and online. The Local Answer To Your Conservatory
> Needs. testing, 1, 2, 3, testing description">
> </head>
>
>
> I am in the process of working a character limit of the meta keyword
> and description, this should stop tags from being over populated with
> data should this ever arise.
>
> Let me know if this is what you were thinking of, otherwise I can
> re-work it ;) Also how would I go about submitting this, when it's
> finished?
>
> Kind regards
>
> Jason Lane
>
Re: Metadata
Posted by g4 <ja...@root10.net>.
On Sunday, Aug 10, 2003, at 10:21 Europe/London, Jeff Turner wrote:
> Hi Jason,
>
> This stuff sounds great :) I look forward to playing with it.
Cool, I obviously now need to apply this to Forrest in general and not
my project, It's to specific at the moment :)
>
> One thought: how about generalising this extra pipeline, and calling it
> 'metadata' or 'meta' instead of 'head'?
>
> In your implementation, everything in the **head-* pipeline originates
> in
> the XML <header> tag, and ends up in the HTML <head> tag. Hence naming
> the pipeline '**head-*' makes sense. But I think we can generalize
> this:
>
> - Not all metadata comes from the <header> tag. For instance, we
> could:
> - fetch the page's 'Last Modified' timestamp from the filesystem.
> - poke CVS and obtain lots of info about a file from there
> - use intelligent software to parse the XML, infer what concepts are
> present in the page and automatically generate metadata [1]
> - Add a 'Creator' field, specifying the Forrest version used to
> create
> the page.
Also do you think that much of the meta content itself could be
generated? For example "my-project" could obviously be used for titles
and metadata, as could any page subtitles. This way much of the
metadata is generated from the page and only the specifics such as
keywords require description.
>
> - Not all metadata is used solely in the HTML <head> tag. I'd like to
> put the 'Last Modified' date in the page body, like Maven sites (see
> maven.apache.org) do.
Nice!
>
> So based on this, we could have a '**metadata-*.html' pipeline that
> serves up XML conforming to a standard metadata format like Dublin
> Core (http://dublincore.org/):
Yup I thought about DC, I hear what you say about generalising this
part.
>
> <metadata xmlns="http://apache/org/forrest/metadata/1.0"
> xmlns:dc="http://purl.org/dc/elements/1.1/">
> <dc:title>
> Essex Conservatories-Direct : The Local Answer To Your
> Conservatory Needs.
> </dc:title>
> <dc:creator>
> Apache Forrest 0.5
> </dc:creator>
> <dc:description>
> Essex, Quality conservatories and sunrooms direct and online. The
> Local Answer To Your Conservatory Needs. testing, 1, 2, 3, testing
> description
> </dc:description>
> <dc:publisher>
> YourCompany
> </dc:publisher>
> <dc:identifier>
> http://yourcompany.com/index.html
> </dc:identifier>
> <dc:language>en</dc:language>
> <dc:date>created: 2002-10-27; modified: 2002-09-20</dc:date>
> </metadata>
>
> There is a list of standard DC elements at
> http://dublincore.org/documents/dces/.
>
>
OK thanks Jeff, I'll get back to you with some improvements ;)
> --Jeff
>
>
> [1] See
> http://directory.google.com/Top/Reference/Knowledge_Management/
> Knowledge_Retrieval/Classification/Software/?il=1
> I have used Klarity (http://archive.klarity.com.au/) before for
> this.
>
> On Fri, Aug 08, 2003 at 03:38:31PM +0100, g4 wrote:
>> Hi Jeff, how's it going?
>>
>> OK I've been tackling this metadata issue we talked about. Just want
>> to
>> make sure I'm heading in the right direction and get some feedback, so
>> this is what I've done:
>>
>> OK so we have this as an example content XML page:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0a//EN"
>> "document-v20.dtd">
>> <document>
>> <header>
>> <title>Essex</title>
>> <!--<authors>
>> <person name="Jeff Turner" email="jefft@apache.org"/>
>> </authors>-->
>> <meta name="keywords">testing, 1, 2, 3, testing
>> keyword</meta>
>> <meta name="description">testing, 1, 2, 3, testing
>> description</meta>
>> </header>
>> <body>
>> <section>
>> <title>to go</title>
>> <subtitle>The Local Answer To Your Conservatory
>> Needs.</subtitle>
>> <tagline>Quality conservatories and sunrooms direct
>> and online.</tagline>
>> <p>You have successfully generated and rendered an
>> <link href="ext:forrest">Apache Forrest</link> site. This page is
>> from the
>> site template. It is found in
>> <code>my-site/src/documentation/content/xdocs/index.xml</code>
>> Please edit it and replace this text with content of
>> your own.</p>
>> </section>
>> </body>
>> </document>
>>
>> 1) so I created a new sitemap.xmap resource called "head"
>>
>> <map:resource name="head">
>> <map:transform src="skins/{forrest:skin}/xslt/html/{type}.xsl">
>> <!-- Can set an alternative project skinconfig here
>> <map:parameter name="config-file"
>> value="../../../../skinconf.xml"/>
>> -->
>> <map:parameter name="path" value="{path}"/>
>> </map:transform>
>>
>> <map:serialize/>
>> </map:resource>
>>
>> 2) We then have a new pipeline, thus:
>>
>> <!-- header -->
>> <map:match pattern="**head-*.html">
>> <map:generate src="cocoon:/{1}{2}.xml"/>
>> <map:transform type="linkrewriter"
>> src="cocoon:/{1}linkmap-{2}.html"/>
>> <map:call resource="head">
>> <map:parameter name="type" value="head2html"/>
>> <map:parameter name="path" value="{1}{2}.html"/>
>> </map:call>
>> </map:match>
>>
>> 3) And then aggregate the whole lot:
>>
>> <map:part src="cocoon:/head-{0}"/>
>>
>> 4) I thought that transforming the head separately made a bit more
>> sense, my only concern is will it slow things down if we have large
>> content files and essentially the content is being parsed twice, no?,
>> anyway the XSL for this (head2html):
>>
>> -->
>> <xsl:stylesheet version="1.0"
>> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>>
>> <xsl:param name="path"/>
>> <xsl:include href="../../../common/xslt/html/dotdots.xsl"/>
>> <xsl:include href="../../../common/xslt/html/pathutils.xsl"/>
>>
>> <xsl:variable name="filename-noext">
>> <xsl:call-template name="filename-noext">
>> <xsl:with-param name="path" select="$path"/>
>> </xsl:call-template>
>> </xsl:variable>
>>
>> <xsl:variable name="root">
>> <xsl:call-template name="dotdots">
>> <xsl:with-param name="path" select="$path"/>
>> </xsl:call-template>
>> </xsl:variable>
>>
>> <xsl:template match="/">
>> <head>
>> <link rel="stylesheet" href="{$root}skin/main.css"
>> type="text/css"/>
>> <xsl:apply-templates/>
>> </head>
>> </xsl:template>
>>
>> <xsl:template match="header">
>> <xsl:apply-templates/>
>> </xsl:template>
>>
>> <xsl:template match="title">
>> <title><xsl:value-of select="."/> Conservatories-Direct :
>> <xsl:value-of select="//subtitle/."/></title>
>> </xsl:template>
>>
>> <xsl:template match="meta">
>> <xsl:if test="@name='description'">
>> <meta content="{//title/.}, {//tagline/.}
>> {//subtitle/.} {.}" name="{@name}"/>
>> </xsl:if>
>> <xsl:if test="@name='keywords'">
>> <meta content="{//title/.},{.}" name="{@name}"/>
>> </xsl:if>
>> </xsl:template>
>>
>> <xsl:template match="body">
>> <!-- ignore the <body/> part -->
>> </xsl:template>
>>
>> </xsl:stylesheet>
>>
>>
>> 5) Finally we call the head from within "site2html",
>>
>> ...
>> <xsl:call-template name="head"/>
>> ...
>> <xsl:template name="head">
>>
>> <xsl:comment>================= start Metadata items
>> ==================</xsl:comment>
>> <xsl:apply-templates select="head"/>
>> <xsl:comment>================= end Menu items
>> ==================</xsl:comment>
>>
>> </xsl:template>
>>
>> ....
>>
>> This produces :
>>
>> <head>
>> <META http-equiv="Content-Type" content="text/html; charset=utf-8">
>> <link type="text/css" href="../skin/main.css" rel="stylesheet">
>> <title>Essex Conservatories-Direct : The Local Answer To Your
>> Conservatory Needs.</title>
>> <meta name="keywords" content="Essex,testing, 1, 2, 3, testing
>> keyword">
>> <meta name="description" content="Essex, Quality conservatories and
>> sunrooms direct and online. The Local Answer To Your Conservatory
>> Needs. testing, 1, 2, 3, testing description">
>> </head>
>>
>>
>> I am in the process of working a character limit of the meta keyword
>> and description, this should stop tags from being over populated with
>> data should this ever arise.
>>
>> Let me know if this is what you were thinking of, otherwise I can
>> re-work it ;) Also how would I go about submitting this, when it's
>> finished?
>>
>> Kind regards
>>
>> Jason Lane
>>
>
>
Jason Lane