You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by Jeff Turner <je...@apache.org> on 2003/08/10 11:21:09 UTC

Re: Metadata

Hi Jason,

This stuff sounds great :)  I look forward to playing with it.

One thought: how about generalising this extra pipeline, and calling it
'metadata' or 'meta' instead of 'head'?

In your implementation, everything in the **head-* pipeline originates in
the XML <header> tag, and ends up in the HTML <head> tag.  Hence naming
the pipeline '**head-*' makes sense.  But I think we can generalize this:

- Not all metadata comes from the <header> tag.  For instance, we could:
  - fetch the page's 'Last Modified' timestamp from the filesystem.
  - poke CVS and obtain lots of info about a file from there
  - use intelligent software to parse the XML, infer what concepts are
    present in the page and automatically generate metadata [1]
  - Add a 'Creator' field, specifying the Forrest version used to create
    the page.

- Not all metadata is used solely in the HTML <head> tag.  I'd like to
  put the 'Last Modified' date in the page body, like Maven sites (see
  maven.apache.org) do.

So based on this, we could have a '**metadata-*.html' pipeline that
serves up XML conforming to a standard metadata format like Dublin
Core (http://dublincore.org/):

<metadata xmlns="http://apache/org/forrest/metadata/1.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>
    Essex Conservatories-Direct : The Local Answer To Your
    Conservatory Needs.
  </dc:title>
  <dc:creator>
    Apache Forrest 0.5
  </dc:creator>
  <dc:description>
    Essex, Quality conservatories and sunrooms direct and online. The
    Local Answer To Your Conservatory Needs. testing, 1, 2, 3, testing
    description
  </dc:description>
  <dc:publisher>
    YourCompany
  </dc:publisher>
  <dc:identifier>
    http://yourcompany.com/index.html
  </dc:identifier>
  <dc:language>en</dc:language>
  <dc:date>created: 2002-10-27; modified: 2002-09-20</dc:date>
</metadata>

There is a list of standard DC elements at
http://dublincore.org/documents/dces/.


--Jeff


[1] See http://directory.google.com/Top/Reference/Knowledge_Management/Knowledge_Retrieval/Classification/Software/?il=1
    I have used Klarity (http://archive.klarity.com.au/) before for this.

On Fri, Aug 08, 2003 at 03:38:31PM +0100, g4 wrote:
> Hi Jeff, how's it going?
> 
> OK I've been tackling this metadata issue we talked about. Just want to 
> make sure I'm heading in the right direction and get some feedback, so 
> this is what I've done:
> 
> OK so we have this as an example content XML page:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0a//EN" 
> "document-v20.dtd">
> <document>
> 	<header>
> 		<title>Essex</title>
> 		<!--<authors>
> 			<person name="Jeff Turner" email="jefft@apache.org"/>
> 		</authors>-->
> 		<meta name="keywords">testing, 1, 2, 3, testing 
> 		keyword</meta>
> 		<meta name="description">testing, 1, 2, 3, testing 
> 		description</meta>
> 	</header>
> 	<body>
> 		<section>
> 			<title>to go</title>
> 			<subtitle>The Local Answer To Your Conservatory 
> 			Needs.</subtitle>
> 			<tagline>Quality conservatories and sunrooms direct 
> 			and online.</tagline>
> 			<p>You have successfully generated and rendered an 
> 			<link href="ext:forrest">Apache Forrest</link> site. This page is from the 
> site template. It is found in
> 			<code>my-site/src/documentation/content/xdocs/index.xml</code>
> 			Please edit it and replace this text with content of 
> 			your own.</p>
> 		</section>
> 	</body>
> </document>
> 
> 1) so I created a new sitemap.xmap resource called "head"
> 
> <map:resource name="head">
>       <map:transform src="skins/{forrest:skin}/xslt/html/{type}.xsl">
>         <!-- Can set an alternative project skinconfig here
>         <map:parameter name="config-file" 
> value="../../../../skinconf.xml"/>
>         -->
>          <map:parameter name="path" value="{path}"/>
>       </map:transform>
> 
>       <map:serialize/>
>     </map:resource>
> 
> 2) We then have a new pipeline, thus:
> 
> <!-- header -->
>        <map:match pattern="**head-*.html">
>         <map:generate src="cocoon:/{1}{2}.xml"/>
>         <map:transform type="linkrewriter" 
> src="cocoon:/{1}linkmap-{2}.html"/>
>         <map:call resource="head">
>           <map:parameter name="type" value="head2html"/>
>           <map:parameter name="path" value="{1}{2}.html"/>
>         </map:call>
>       </map:match>
> 
> 3) And then aggregate the whole lot:
> 
> <map:part src="cocoon:/head-{0}"/>
> 
> 4) I thought that transforming the head separately made a bit more 
> sense, my only concern is will it slow things down if we have large 
> content files and essentially the content is being parsed twice, no?, 
> anyway the XSL for this (head2html):
> 
> -->
> <xsl:stylesheet version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> 
> 	<xsl:param name="path"/>
> 	<xsl:include href="../../../common/xslt/html/dotdots.xsl"/>
> 	<xsl:include href="../../../common/xslt/html/pathutils.xsl"/>
> 
> 	<xsl:variable name="filename-noext">
> 		<xsl:call-template name="filename-noext">
> 			<xsl:with-param name="path" select="$path"/>
> 		</xsl:call-template>
> 	</xsl:variable>
> 	
> 	<xsl:variable name="root">
> 		<xsl:call-template name="dotdots">
> 			<xsl:with-param name="path" select="$path"/>
> 		</xsl:call-template>
> 	</xsl:variable>
> 	
> 	<xsl:template match="/">
> 		<head>
> 			<link rel="stylesheet" href="{$root}skin/main.css" 
> 			type="text/css"/>
> 		<xsl:apply-templates/>
> 		</head>
> 	</xsl:template>
> 	
> 	<xsl:template match="header">
> 		<xsl:apply-templates/>
> 	</xsl:template>
> 	
> 	<xsl:template match="title">
> 		<title><xsl:value-of select="."/> Conservatories-Direct : 
> <xsl:value-of select="//subtitle/."/></title>
> 	</xsl:template>
> 
> 	<xsl:template match="meta">
> 		<xsl:if test="@name='description'">
> 			<meta content="{//title/.}, {//tagline/.} 
> 			{//subtitle/.} {.}" name="{@name}"/>
> 		</xsl:if>
> 		<xsl:if test="@name='keywords'">
> 			<meta content="{//title/.},{.}" name="{@name}"/>
> 		</xsl:if>
> 	</xsl:template>
> 	
> 	<xsl:template match="body">
> 		<!-- ignore the <body/> part -->
> 	</xsl:template>
> 
> </xsl:stylesheet>
> 
> 
> 5) Finally we call the head from within "site2html",
> 
> ...
> <xsl:call-template name="head"/>
> ...
> <xsl:template name="head">
> 
> 	<xsl:comment>================= start Metadata items 
> ==================</xsl:comment>
> 	<xsl:apply-templates select="head"/>
> 	<xsl:comment>================= end Menu items 
> ==================</xsl:comment>
> 
> </xsl:template>
> 
> ....
> 
> This produces :
> 
> <head>
> <META http-equiv="Content-Type" content="text/html; charset=utf-8">
> <link type="text/css" href="../skin/main.css" rel="stylesheet">
> <title>Essex Conservatories-Direct : The Local Answer To Your 
> Conservatory Needs.</title>
> <meta name="keywords" content="Essex,testing, 1, 2, 3, testing keyword">
> <meta name="description" content="Essex, Quality conservatories and 
> sunrooms direct and online. The Local Answer To Your Conservatory 
> Needs. testing, 1, 2, 3, testing description">
> </head>
> 
> 
> I am in the process of working a character limit of the meta keyword 
> and description, this should stop tags from being over populated with 
> data should this ever arise.
> 
> Let me know if this is what you were thinking of, otherwise I can 
> re-work it ;) Also how would I go about submitting this, when it's 
> finished?
> 
> Kind regards
> 
> Jason Lane
> 

Re: Metadata

Posted by g4 <ja...@root10.net>.
On Sunday, Aug 10, 2003, at 10:21 Europe/London, Jeff Turner wrote:

> Hi Jason,
>
> This stuff sounds great :)  I look forward to playing with it.

Cool, I obviously now need to apply this to Forrest in general and not  
my project, It's to specific at the moment :)

>
> One thought: how about generalising this extra pipeline, and calling it
> 'metadata' or 'meta' instead of 'head'?
>
> In your implementation, everything in the **head-* pipeline originates  
> in
> the XML <header> tag, and ends up in the HTML <head> tag.  Hence naming
> the pipeline '**head-*' makes sense.  But I think we can generalize  
> this:
>
> - Not all metadata comes from the <header> tag.  For instance, we  
> could:
>   - fetch the page's 'Last Modified' timestamp from the filesystem.
>   - poke CVS and obtain lots of info about a file from there
>   - use intelligent software to parse the XML, infer what concepts are
>     present in the page and automatically generate metadata [1]
>   - Add a 'Creator' field, specifying the Forrest version used to  
> create
>     the page.

Also do you think that much of the meta content itself could be  
generated? For example "my-project" could obviously be used for titles  
and metadata, as could any page subtitles. This way much of the  
metadata is generated from the page and only the specifics such as  
keywords require description.

>
> - Not all metadata is used solely in the HTML <head> tag.  I'd like to
>   put the 'Last Modified' date in the page body, like Maven sites (see
>   maven.apache.org) do.

Nice!

>
> So based on this, we could have a '**metadata-*.html' pipeline that
> serves up XML conforming to a standard metadata format like Dublin
> Core (http://dublincore.org/):

Yup I thought about DC, I hear what you say about generalising this  
part.

>
> <metadata xmlns="http://apache/org/forrest/metadata/1.0"
>   xmlns:dc="http://purl.org/dc/elements/1.1/">
>   <dc:title>
>     Essex Conservatories-Direct : The Local Answer To Your
>     Conservatory Needs.
>   </dc:title>
>   <dc:creator>
>     Apache Forrest 0.5
>   </dc:creator>
>   <dc:description>
>     Essex, Quality conservatories and sunrooms direct and online. The
>     Local Answer To Your Conservatory Needs. testing, 1, 2, 3, testing
>     description
>   </dc:description>
>   <dc:publisher>
>     YourCompany
>   </dc:publisher>
>   <dc:identifier>
>     http://yourcompany.com/index.html
>   </dc:identifier>
>   <dc:language>en</dc:language>
>   <dc:date>created: 2002-10-27; modified: 2002-09-20</dc:date>
> </metadata>
>
> There is a list of standard DC elements at
> http://dublincore.org/documents/dces/.
>
>


OK thanks Jeff, I'll get back to you with some improvements ;)

> --Jeff
>
>
> [1] See  
> http://directory.google.com/Top/Reference/Knowledge_Management/ 
> Knowledge_Retrieval/Classification/Software/?il=1
>     I have used Klarity (http://archive.klarity.com.au/) before for  
> this.
>
> On Fri, Aug 08, 2003 at 03:38:31PM +0100, g4 wrote:
>> Hi Jeff, how's it going?
>>
>> OK I've been tackling this metadata issue we talked about. Just want  
>> to
>> make sure I'm heading in the right direction and get some feedback, so
>> this is what I've done:
>>
>> OK so we have this as an example content XML page:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0a//EN"
>> "document-v20.dtd">
>> <document>
>> 	<header>
>> 		<title>Essex</title>
>> 		<!--<authors>
>> 			<person name="Jeff Turner" email="jefft@apache.org"/>
>> 		</authors>-->
>> 		<meta name="keywords">testing, 1, 2, 3, testing
>> 		keyword</meta>
>> 		<meta name="description">testing, 1, 2, 3, testing
>> 		description</meta>
>> 	</header>
>> 	<body>
>> 		<section>
>> 			<title>to go</title>
>> 			<subtitle>The Local Answer To Your Conservatory
>> 			Needs.</subtitle>
>> 			<tagline>Quality conservatories and sunrooms direct
>> 			and online.</tagline>
>> 			<p>You have successfully generated and rendered an
>> 			<link href="ext:forrest">Apache Forrest</link> site. This page is  
>> from the
>> site template. It is found in
>> 			<code>my-site/src/documentation/content/xdocs/index.xml</code>
>> 			Please edit it and replace this text with content of
>> 			your own.</p>
>> 		</section>
>> 	</body>
>> </document>
>>
>> 1) so I created a new sitemap.xmap resource called "head"
>>
>> <map:resource name="head">
>>       <map:transform src="skins/{forrest:skin}/xslt/html/{type}.xsl">
>>         <!-- Can set an alternative project skinconfig here
>>         <map:parameter name="config-file"
>> value="../../../../skinconf.xml"/>
>>         -->
>>          <map:parameter name="path" value="{path}"/>
>>       </map:transform>
>>
>>       <map:serialize/>
>>     </map:resource>
>>
>> 2) We then have a new pipeline, thus:
>>
>> <!-- header -->
>>        <map:match pattern="**head-*.html">
>>         <map:generate src="cocoon:/{1}{2}.xml"/>
>>         <map:transform type="linkrewriter"
>> src="cocoon:/{1}linkmap-{2}.html"/>
>>         <map:call resource="head">
>>           <map:parameter name="type" value="head2html"/>
>>           <map:parameter name="path" value="{1}{2}.html"/>
>>         </map:call>
>>       </map:match>
>>
>> 3) And then aggregate the whole lot:
>>
>> <map:part src="cocoon:/head-{0}"/>
>>
>> 4) I thought that transforming the head separately made a bit more
>> sense, my only concern is will it slow things down if we have large
>> content files and essentially the content is being parsed twice, no?,
>> anyway the XSL for this (head2html):
>>
>> -->
>> <xsl:stylesheet version="1.0"
>> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>>
>> 	<xsl:param name="path"/>
>> 	<xsl:include href="../../../common/xslt/html/dotdots.xsl"/>
>> 	<xsl:include href="../../../common/xslt/html/pathutils.xsl"/>
>>
>> 	<xsl:variable name="filename-noext">
>> 		<xsl:call-template name="filename-noext">
>> 			<xsl:with-param name="path" select="$path"/>
>> 		</xsl:call-template>
>> 	</xsl:variable>
>> 	
>> 	<xsl:variable name="root">
>> 		<xsl:call-template name="dotdots">
>> 			<xsl:with-param name="path" select="$path"/>
>> 		</xsl:call-template>
>> 	</xsl:variable>
>> 	
>> 	<xsl:template match="/">
>> 		<head>
>> 			<link rel="stylesheet" href="{$root}skin/main.css"
>> 			type="text/css"/>
>> 		<xsl:apply-templates/>
>> 		</head>
>> 	</xsl:template>
>> 	
>> 	<xsl:template match="header">
>> 		<xsl:apply-templates/>
>> 	</xsl:template>
>> 	
>> 	<xsl:template match="title">
>> 		<title><xsl:value-of select="."/> Conservatories-Direct :
>> <xsl:value-of select="//subtitle/."/></title>
>> 	</xsl:template>
>>
>> 	<xsl:template match="meta">
>> 		<xsl:if test="@name='description'">
>> 			<meta content="{//title/.}, {//tagline/.}
>> 			{//subtitle/.} {.}" name="{@name}"/>
>> 		</xsl:if>
>> 		<xsl:if test="@name='keywords'">
>> 			<meta content="{//title/.},{.}" name="{@name}"/>
>> 		</xsl:if>
>> 	</xsl:template>
>> 	
>> 	<xsl:template match="body">
>> 		<!-- ignore the <body/> part -->
>> 	</xsl:template>
>>
>> </xsl:stylesheet>
>>
>>
>> 5) Finally we call the head from within "site2html",
>>
>> ...
>> <xsl:call-template name="head"/>
>> ...
>> <xsl:template name="head">
>>
>> 	<xsl:comment>================= start Metadata items
>> ==================</xsl:comment>
>> 	<xsl:apply-templates select="head"/>
>> 	<xsl:comment>================= end Menu items
>> ==================</xsl:comment>
>>
>> </xsl:template>
>>
>> ....
>>
>> This produces :
>>
>> <head>
>> <META http-equiv="Content-Type" content="text/html; charset=utf-8">
>> <link type="text/css" href="../skin/main.css" rel="stylesheet">
>> <title>Essex Conservatories-Direct : The Local Answer To Your
>> Conservatory Needs.</title>
>> <meta name="keywords" content="Essex,testing, 1, 2, 3, testing  
>> keyword">
>> <meta name="description" content="Essex, Quality conservatories and
>> sunrooms direct and online. The Local Answer To Your Conservatory
>> Needs. testing, 1, 2, 3, testing description">
>> </head>
>>
>>
>> I am in the process of working a character limit of the meta keyword
>> and description, this should stop tags from being over populated with
>> data should this ever arise.
>>
>> Let me know if this is what you were thinking of, otherwise I can
>> re-work it ;) Also how would I go about submitting this, when it's
>> finished?
>>
>> Kind regards
>>
>> Jason Lane
>>
>
>
Jason Lane