You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by Rasik Pandey <rb...@gmail.com> on 2005/07/13 20:18:37 UTC

Re: Add support for Googles sitemap protocol?

Ross Gardler wrote:

>> Ferdinand Soethe wrote:
>> Good point. However, I don't think OAI has a "minimal" form, I did some 
>> preliminary research into it a few months ago. Let me check it out, I'll 
>> report back.
>>
>> However, I'd still like to see support for Google sitemaps since we can 
>> do it very quickly and it is more "approachable" than OAI since everyone 
>> knows Google.
>>
>> If we go for the Google format, I'd like to suggest to use slightly
>> more than the minimum format in this form (as documented in
>> https://www.google.com/webmasters/sitemaps/docs/en/protocol.html)
>> 
>> <?xml version="1.0" encoding="UTF-8"?>
>> <urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
>>    <url>
>>       <loc>http://www.yoursite.com/catalog?item=83&amp;desc=vacation_usa</loc>
>>       <lastmod>2004-11-23</lastmod>
>>    </url>
>> </urlset>
>> 
>> and include the 'lastmod' right away as that would be the key to speedy
>> updates. Can we do that?

Why not use rss2.0 as the format
http://www.google.com/webmasters/sitemaps/docs/en/other.html#feed ?


> I'd recomend getting the minimal done, then looking at a way of getting 
> the lastmod as well.

What do you consider the minimal? In rss <pubDate> and <link> ?

>> Did you see that Google wants the urls to be url encoded? Does our
>> XSLT-engine have a function for that?
>
> http://www.exslt.org/str/functions/encode-uri/index.html

Why not use the
http://cocoon.apache.org/2.1/userdocs/transformers/encodeurl-transformer.html?

Regards,
Rus
http://www.discountdracula.com

Re: Add support for Googles sitemap protocol?

Posted by Thorsten Scherler <th...@apache.org>.
On Wed, 2005-07-13 at 23:30 +0100, Ross Gardler wrote:
> Ross Gardler wrote:
> > Rasik Pandey wrote:
> 
> ...
> 
> >>>> and include the 'lastmod' right away as that would be the key to speedy
> >>
> >>
> >>>> updates. Can we do that?
> >>
> >>
> >> Why not use rss2.0 as the format 
> >> http://www.google.com/webmasters/sitemaps/docs/en/other.html#feed
> >>  ?
> > 
> > 
> > It's not the format of the document that is a problem, that part is 
> > easy. The hard part is knowing when the page has been regnerated because 
> > of a change.
> > 
> 
> (identifying a potential solution to the problem I identified here...)
> 
> Perhaps you can use the XPathDirectoryGenerator [1] to identify when 
> files were last modified?
> 

...or
<map:generator 
        name="traverse" 
        src="org.apache.cocoon.generation.TraversableGenerator" 
        logger="sitemap.generator.traverse" 
        label="content" 
        pool-max="16"
      />

In sitemap.xmap:
<map:generate type="traverse" src="{project:content.xdocs}"/> 
gives:
<collection:collection 
    xmlns:collection="http://apache.org/cocoon/collection/1.0"
name="xdocs" 

uri="file:/home/thorsten/src/newSeed/src/documentation/content/xdocs/" 
    lastModified="1121125614000" date="7/12/05 1:46 AM" size="4096"
sort="name" 
    reverse="false" requested="true">
    <collection:collection name="images" 

uri="file:/home/thorsten/src/newSeed/src/documentation/content/xdocs/images/" 
      lastModified="1121121526000" date="7/12/05 12:38 AM" size="4096">
      <collection:resource name="group-logo.gif" 

uri="file:/home/thorsten/src/newSeed/src/documentation/content/xdocs/images/group-logo.gif" 
        lastModified="1121121525000" date="7/12/05 12:38 AM"
size="1092"/>
    </collection:collection>
  </collection:collection>

http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/generation/TraversableGenerator.html
http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/generation/XPathTraversableGenerator.html

> As for using RSS, when we originally discussed this RSS was not one of 
> the supported formats, hence we did not discuss it as an option. 
> However, it certainly has advantages over a proprietary Google schema. 
> So +1 for using that if you intend on implementing this.
> 

+1

salu2

> Ross
> 
> [1] 
> http://cocoon.apache.org/2.1/userdocs/generators/xpathdirectory-generator.html
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)


Re: Add support for Googles sitemap protocol?

Posted by Rasik Pandey <rb...@gmail.com>.
Just put the first version, 0.1-dev, which works with Forrest 0.7 into JIRA. 
Ignore the first attachment.

See:
http://issues.apache.org/jira/browse/FOR-597


-- 
Regards,
Rus
www.discountdracula
<http://www.discountdracula.com>.com<http://www.discountdracula.com>
"Your Bargain BloodSucka:
Suckin' the Best Deals Outta the Web"

Re: Add support for Googles sitemap protocol?

Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
> 
>  > To answer you specific question. Anything defined in a plugin sitemap
>  > (internal or otherwides) has the same access limitations that you will
>  > find in any Cocoon sitemap. That means:
> 
> Which means views are bound to the sitemap in which they are defined and 
> leaves us two options, redeclare all pertinent pipelines in the 
> internal.xmap
> 
> OR add these two pieces to root Forrest sitemap.xmap:
> 
> <map:serializers>
> <map:serializer name="links" 
> src="org.apache.cocoon.serialization.LinkSerializer">
> <encoding>ISO-8859-1</encoding>
> </map:serializer>
> </map:serializers>
> 
> <map:views>
> <map:view name="links" from-position="last">
> <map:serialize type="links"/>
> </map:view>
> </map:views>
> 
> So the LinkStatusGenerator will be able to access the requisite 
> information. I am NOT a fan of  either approach :( as neither is a neat 
> implementation.

I agree neither solution is neat. For version 0.1 of your plugin I would 
redefine all pertinent pipelines in the internal.xmap. The longer term 
solution will be to use the shiny new sitemap block mounting mechanism 
once we can upgrade Cocoon in Forrest. As I think I mentioned before 
this allows a level of inheritance in sitemaps, complete with the 
ability to override items in the super sitemaps. It even allows for 
multiple inheritance.

>  > What properties do you want?
>  >
>  > As a hint you can access most of the properties with {project:foo} where
>  > foo is defined within forrest.xconf
> 
> You mean within the "forrest.properties" file and NOT the 
> "forrest.xconf" file, right?

No, I meant forrest.xconf, but my response was certainly confusing. Let 
me try and explain.

Within forrest.xconf there are a properties set that can be accessed 
with {forrest:foo} and another set that are accessed with {project:bar}. 
The values of these properties are (in some cases) set in 
forrest.properties. In other words the user is not supposed to edit 
forrest.xconf

See the element <component-instance name="defaults" 
class="org.apache.forrest.conf.ForrestConfModule"> for the {forrest:foo} 
properties and <component-instance name="project" 
class="org.apache.forrest.conf.ForrestConfModule"> for the {project:foo} 
properties.

The @foo.bar@ are tags that are replaced by Ant when Forrest is 
launched, but in the main they have the same names as properties set in 
forrest.properties.

Ross

Re: Add support for Googles sitemap protocol?

Posted by Rasik Pandey <rb...@gmail.com>.
> To answer you specific question. Anything defined in a plugin sitemap
> (internal or otherwides) has the same access limitations that you will
> find in any Cocoon sitemap. That means:

Which means views are bound to the sitemap in which they are defined and 
leaves us two options, redeclare all pertinent pipelines in the 
internal.xmap

OR add these two pieces to root Forrest sitemap.xmap:

<map:serializers>
<map:serializer name="links" src="
org.apache.cocoon.serialization.LinkSerializer">
<encoding>ISO-8859-1</encoding>
</map:serializer>
</map:serializers>

<map:views>
<map:view name="links" from-position="last">
<map:serialize type="links"/>
</map:view>
</map:views>

So the LinkStatusGenerator will be able to access the requisite information. 
I am NOT a fan of either approach :( as neither is a neat implementation.




> What properties do you want?
> 
> As a hint you can access most of the properties with {project:foo} where
> foo is defined within forrest.xconf

You mean within the "forrest.properties" file and NOT the "forrest.xconf" 
file, right?

That is what I am currently using. I declared a

#project.siteBaseURI=http://www.discountdracula.com

and use it as you described in my internal.xmap.

> Looks like you are stuck with a standard internal plugin for th
> eforseeable future.

Tough luck...


-- 
Regards,
Rus
www.discountdracula.com <http://www.discountdracula.com>
"Your Bargain BloodSucka:
Suckin' the Best Deals Outta the Web"

Re: Add support for Googles sitemap protocol?

Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
> Hi Ross,
> 
>  > Agghhhh!!! There's that term "views" again.
>  >
>  > We have a real problem here in Forrest at the moment. Views are being
>  > used to refer to two different things (views in Eclipse and views, the
>  > replacement for skins). Now we seem to have a third use, I assume this
>  > is sitemap views (funnily enough I said this may become a conflict in
>  > another mail earlier today, never thought it would already have happened
>  > in my inbox).
> 
> So with an internal plugin (internal.xmap) will the views (cocoon 
> map:views) be propagated downard to the main Forrest sitemap? I am new 
> to "plugins" so if my question contains misconceptions, please let me 
> know. Or are views still orthogonal(bound) to specific sitemaps as in 
> Cocoon?

Hehe, you see the confusion we are causing using views to refer to so 
many different things (are you listening Thorsten?)

To answer you specific question. Anything defined in a plugin sitemap 
(internal or otherwides) has the same access limitations that you will 
find in any Cocoon sitemap. That means:

if you use cocon:/ it will only search within the root sitemap (i.e. 
Forrests sitemap.xmap)

if you use cocoon:// it will search in all subsitemaps starting from the 
root sitemap

>  > I'm going to delay responding to this because one of the things that
>  > happened at the Hackathon is that Ferdinand looked at an alternative
>  > method of identifying which pages were regenerated in a run. Perhaps we
>  > should wait to see if he thinks it can be applied here.
> 
> Any news on this front?

Ferdinand is away for a week. We won't hear anything until he returns 
sometime next week.

>  > Internal plugins are for this kind of thing. However, as they stand they
>  > don't make the code much more maintainable.
> 
> I think I have the most basic functionality (not tested)  implemented in 
> an internal plugin. What is the simplest means of pulling values from a 
> plugin's forrest.properties and accessing them in the internal.xmap? Is 
> there an InputModule which reads .properties files?

What properties do you want?

As a hint you can access most of the properties with {project:foo} where 
foo is defined within forrest.xconf

>  > A possible solution is to use Cocoons new block sitemap loading
>  > features. This was demonstrated to me at the Hackathon (thanks Daniel),
>  > it provides a way for plugins to extend existing sitemaps (and other
>  > plugins). I need to do some experimentation with this so you should
>  > proceed with an internal plugin for now, we'll address the
>  > maintainability when experiments are complete.
> 
> Ok thanks.

Unfortunately there is a problem with the CLI in Cocoon Head, 
consequently we cannot update Cocoon in Forrest just yet, therefore I 
can't do the above experimentation yet. Cheche is woring with the Cocoon 
folk to get the CLI sorted out.

Looks like you are stuck with a standard internal plugin for th 
eforseeable future.

Ross

Re: Add support for Googles sitemap protocol?

Posted by Rasik Pandey <rb...@gmail.com>.
Hi Ross,

> Agghhhh!!! There's that term "views" again.
>
> We have a real problem here in Forrest at the moment. Views are being
> used to refer to two different things (views in Eclipse and views, the
> replacement for skins). Now we seem to have a third use, I assume this
> is sitemap views (funnily enough I said this may become a conflict in
> another mail earlier today, never thought it would already have happened
> in my inbox).

So with an internal plugin (internal.xmap) will the views (cocoon map:views) 
be propagated downard to the main Forrest sitemap? I am new to "plugins" so 
if my question contains misconceptions, please let me know. Or are views 
still orthogonal(bound) to specific sitemaps as in Cocoon?


> I'm going to delay responding to this because one of the things that
> happened at the Hackathon is that Ferdinand looked at an alternative
> method of identifying which pages were regenerated in a run. Perhaps we
> should wait to see if he thinks it can be applied here.

Any news on this front?

> Internal plugins are for this kind of thing. However, as they stand they
> don't make the code much more maintainable.

I think I have the most basic functionality (not tested) implemented in an 
internal plugin. What is the simplest means of pulling values from a 
plugin's forrest.properties and accessing them in the internal.xmap? Is 
there an InputModule which reads .properties files?

> A possible solution is to use Cocoons new block sitemap loading
> features. This was demonstrated to me at the Hackathon (thanks Daniel),
> it provides a way for plugins to extend existing sitemaps (and other
> plugins). I need to do some experimentation with this so you should
> proceed with an internal plugin for now, we'll address the
> maintainability when experiments are complete.

Ok thanks.


-- 
Regards,
Rus
www.discountdracula
<http://www.discountdracula.com>.com<http://www.discountdracula.com>
"Your Bargain BloodSucka:
Suckin' the Best Deals Outta the Web"

Re: Add support for Googles sitemap protocol?

Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
> Hello,
> 
> Ross wrote:
>  >>http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.java 
> <http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.java>
>  > > would do the trick, although it would have to be modified to make a 
> call
>  > > to get the "last-modified" header, so hopefully we could get that added
>  > > to a future release of cocoon. With a quick examination of the 
> code, it
>  > > looks like it will crawl a URL and generate an xml report, allowing
>  > > includes and excludes expressions.
>  >
>  > We can create a new generator that extands the LinkStatusGenerator and
>  > house it here. If Cocoon want it we will remove it from here at a later
>  > date.
> 
> I have a few issues with the approach which I proposed, hopefully you 
> can help me make some sense about my reservations. First, since views 
> are orthogonal to each sitemap the cocoon-view=links required by the 
> LinkStatusGenerator and therefore must be declared in the parent sitemap 
> which does the core matching.

Agghhhh!!! There's that term "views" again.

We have a real problem here in Forrest at the moment. Views are being 
used to refer to two different things (views in Eclipse and views, the 
replacement for skins). Now we seem to have a third use, I assume this 
is sitemap views (funnily enough I said this may become a conflict in 
another mail earlier today, never thought it would already have happened 
in my inbox).

I'm going to delay responding to this because one of the things that 
happened at the Hackathon is that Ferdinand looked at an alternative 
method of identifying which pages were regenerated in a run. Perhaps we 
should wait to see if he thinks it can be applied here.

> Secondly, in building such a plugin for 
> forrest, the plugin-sitemap would have to override/redeclare a number of 
> pipeline match(s)  in order to have access/provide to the necessary 
> request header,"last-modified", as there doesn't seem to be a generic 
> means for providing this information and passing it up to parent 
> pipeline match(s) so they are conditionally added to the request 
> headers. Overriding many pipelineThis doesn't seem very maintainable. 
> Would anyone have any ideas on how to achieve this or an alternative 
> approach?

Internal plugins are for this kind of thing. However, as they stand they 
don't make the code much more maintainable.

A possible solution is to use Cocoons new block sitemap loading 
features. This was demonstrated to me at the Hackathon (thanks Daniel), 
it provides a way for plugins to extend existing sitemaps (and other 
plugins). I need to do some experimentation with this so you should 
proceed with an internal plugin for now, we'll address the 
maintainability when experiments are complete.

Ross

Re: Add support for Googles sitemap protocol?

Posted by Rasik Pandey <rb...@gmail.com>.
Hello,

Ross wrote:
>>
http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.java
> > would do the trick, although it would have to be modified to make a call
> > to get the "last-modified" header, so hopefully we could get that added
> > to a future release of cocoon. With a quick examination of the code, it
> > looks like it will crawl a URL and generate an xml report, allowing
> > includes and excludes expressions.
> 
> We can create a new generator that extands the LinkStatusGenerator and
> house it here. If Cocoon want it we will remove it from here at a later
> date.

I have a few issues with the approach which I proposed, hopefully you can 
help me make some sense about my reservations. First, since views are 
orthogonal to each sitemap the cocoon-view=links required by the 
LinkStatusGenerator and therefore must be declared in the parent sitemap 
which does the core matching. Secondly, in building such a plugin for 
forrest, the plugin-sitemap would have to override/redeclare a number of 
pipeline match(s) in order to have access/provide to the necessary request 
header,"last-modified", as there doesn't seem to be a generic means for 
providing this information and passing it up to parent pipeline match(s) so 
they are conditionally added to the request headers. Overriding many 
pipelineThis doesn't seem very maintainable. Would anyone have any ideas on 
how to achieve this or an alternative approach?

> This should not be configured from skinconf.xml. There a few reasons for
> this, firstly it has nothing to do with the skin, which is about the
> look and feel of the site. Secondly because skins (and therefore
> skinconf.xml) are being deprecated in 0.8 in favour of views. Finally
> this should be an output plugin and therefore needs to be configured
> from the plugin.

Point taken.

> The variables in the sitemap, such as {project:stylesheets} are defined
> in forrest.xconf and are given values from forrest.properties during Ant
> script (the init target if I remember correctly). However, since this is
> a plugin we do not want to be adding new config values to
> forrest.properties. So again, the config needs to be in the plugin.

Agreed...

> How do you do that?
> 
> We don't know yet. We have discussed it quite a few times but have not
> yet come up with a final solution. Although Thorstens recent work on
> View configuratoin has made some of the options we talked about possible.
> 
> Since we don't yet know a solution for this perhaps we can work the
> other way around. When you get to the point of needing to add these
> configurations tell us exactly what you need to do and we will use it as
> a use case for defining the per plugin configs.

Will do, thanks.

-- 
Regards,
Rus
www.discountdracula.com <http://www.discountdracula.com>

Re: Add support for Googles sitemap protocol?

Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:

>  > This is a good point. How about also also providing a generator that
>  > would get the last modified header of remote resources. The results of
>  > the two could be aggregated together.
> 
> I think 
> http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.java 
> <http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.java> 
> would do the trick, although it would have to be modified to make a call 
> to get the "last-modified" header, so hopefully we could get that added 
> to a future release of cocoon. With a quick examination of the code, it 
> looks like it will crawl a URL and generate an xml report, allowing 
> includes and excludes expressions.

We can create a new generator that extands the LinkStatusGenerator and 
house it here. If Cocoon want it we will remove it from here at a later 
date.

>  > However, this still is not totally robust, becayse some remote resources
>  > will always indicate that they have changed even when the content has
>  > not (for example Daisy tracks changes to meta-data that Forrest does not
>  > currently use).
> 
> What strategy do you propose to handle this case if any?

This is a special case. I would not worry about it just yet. The Daisy 
plugin is still in the whiteboard anyway. In fact the one that is in SVN 
right now would work with the above approach, it is the one on my hard 
drive that would have a problem.

>   >> I may need some assistance to know how to build in hooks from
>   >> skinconf.xml to the sitemap format generation.
>  
>  > I'm not sure what you mean by that. But there are plenty of people here
>  > to answer your questions as they arise.
> 
> I am sure there will be a need to allow users to specify a configuration 
> for this like the includes/excludes on the LinkStatusGenerator crawls 
> and maybe the <changefreq> value for the google sitemap format.  Can you 
> give me a quick overview of how params make it from the skinconf.xml to 
> the sitemap(s) or xsl(s)?

This should not be configured from skinconf.xml. There a few reasons for 
this, firstly it has nothing to do with the skin, which is about the 
look and feel of the site. Secondly because skins (and therefore 
skinconf.xml) are being deprecated in 0.8 in favour of views. Finally 
this should be an output plugin and therefore needs to be configured 
from the plugin.

The variables in the sitemap, such as {project:stylesheets} are defined 
in forrest.xconf and are given values from forrest.properties during Ant 
script (the init target if I remember correctly). However, since this is 
a plugin we do not want to be adding new config values to 
forrest.properties. So again, the config needs to be in the plugin.

How do you do that?

We don't know yet. We have discussed it quite a few times but have not 
yet come up with a final solution. Although Thorstens recent work on 
View configuratoin has made some of the options we talked about possible.

Since we don't yet know a solution for this perhaps we can work the 
other way around. When you get to the point of needing to add these 
configurations tell us exactly what you need to do and we will use it as 
a use case for defining the per plugin configs.

Ross

Re: Add support for Googles sitemap protocol?

Posted by Rasik Pandey <rb...@gmail.com>.
Hi Ross,
> This is a good point. How about also also providing a generator that
> would get the last modified header of remote resources. The results of
> the two could be aggregated together.
I think 
http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.javawould
do the trick, although it would have to be modified to make a call to
get the "last-modified" header, so hopefully we could get that added to a 
future release of cocoon. With a quick examination of the code, it looks 
like it will crawl a URL and generate an xml report, allowing includes and 
excludes expressions.
 
> However, this still is not totally robust, becayse some remote resources
> will always indicate that they have changed even when the content has
> not (for example Daisy tracks changes to meta-data that Forrest does not
> currently use).
What strategy do you propose to handle this case if any?

 >> Are you familiar with 
>> 
http://cocoon.apache.org/2.1/userdocs/generators/linkstatus-generator.html
>> , the documentation is skimpy, but it may be what we need to handle both
>> static and dynamic cases.
> No I'm not familiar. I wonder what the docs mean by "status". Will it 
 > provide the last modified header as suggested above?

See above...

> I don't have the time to experiment with it now, but I (and I am sure> 
other devs) would love to hear about your findings.

See above... 

>> I may need some assistance to know how to build in hooks from
 >> skinconf.xml to the sitemap format generation.
 > I'm not sure what you mean by that. But there are plenty of people here
> to answer your questions as they arise.

I am sure there will be a need to allow users to specify a configuration for 
this like the includes/excludes on the LinkStatusGenerator crawls and maybe 
the <changefreq> value for the google sitemap format. Can you give me a 
quick overview of how params make it from the skinconf.xml to the sitemap(s) 
or xsl(s)?


Regards,
Rus
http://www.discountdracula.com

Re: Add support for Googles sitemap protocol?

Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
>      >> It's not the format of the document that is a problem, that part is
>      >> easy. The hard part is knowing when the page has been regnerated
>     because
>      >> of a change.
>      >>
> 
>     (identifying a potential solution to the problem I identified here...)
> 
>      > Perhaps you can use the XPathDirectoryGenerator [1] to identify when
>      > files were last modified?
> 
> 
> My case uses dynamically generated xml files, so I don't think this is a 
> robust solution...? 

This is a good point. How about also also providing a generator that 
would get the last modified header of remote resources. The results of 
the two could be aggregated together.

However, this still is not totally robust, becayse some remote resources 
will always indicate that they have changed even when the content has 
not (for example Daisy tracks changes to meta-data that Forrest does not 
currently use).

> Are you familiar with 
> http://cocoon.apache.org/2.1/userdocs/generators/linkstatus-generator.html 
> , the documentation is skimpy, but it may be what we need to handle both 
> static and dynamic cases.

No I'm not familiar. I wonder what the docs mean by "status". Will it 
provide the last modified header as suggested above?

I don't have the time to experiment with it now, but I (and I am sure 
other devs) would love to hear about your findings.

> I think both the google and RSS formats are simple enough to provide. 
> Although, I may need some assistance to know how to build in hooks from 
> skinconf.xml to the sitemap format generation.

I'm not sure what you mean by that. But there are plenty of people here 
to answer your questions as they arise.

Ross

Re: Add support for Googles sitemap protocol?

Posted by Rasik Pandey <rb...@gmail.com>.
> 
> >> It's not the format of the document that is a problem, that part is
> >> easy. The hard part is knowing when the page has been regnerated 
> because
> >> of a change.
> >>

(identifying a potential solution to the problem I identified here...)
> 
> > Perhaps you can use the XPathDirectoryGenerator [1] to identify when
> > files were last modified?


My case uses dynamically generated xml files, so I don't think this is a 
robust solution...? Are you familiar with 
http://cocoon.apache.org/2.1/userdocs/generators/linkstatus-generator.html , 
the documentation is skimpy, but it may be what we need to handle both 
static and dynamic cases.

> As for using RSS, when we originally discussed this RSS was not one of
> > the supported formats, hence we did not discuss it as an option.
> > However, it certainly has advantages over a proprietary Google schema.
> > So +1 for using that if you intend on implementing this.


I think both the google and RSS formats are simple enough to provide. 
Although, I may need some assistance to know how to build in hooks from 
skinconf.xml to the sitemap format generation.
Regards,
Rus
http://www.discountdracula.com

Re: Add support for Googles sitemap protocol?

Posted by Ross Gardler <rg...@apache.org>.
Ross Gardler wrote:
> Rasik Pandey wrote:

...

>>>> and include the 'lastmod' right away as that would be the key to speedy
>>
>>
>>>> updates. Can we do that?
>>
>>
>> Why not use rss2.0 as the format 
>> http://www.google.com/webmasters/sitemaps/docs/en/other.html#feed
>>  ?
> 
> 
> It's not the format of the document that is a problem, that part is 
> easy. The hard part is knowing when the page has been regnerated because 
> of a change.
> 

(identifying a potential solution to the problem I identified here...)

Perhaps you can use the XPathDirectoryGenerator [1] to identify when 
files were last modified?

As for using RSS, when we originally discussed this RSS was not one of 
the supported formats, hence we did not discuss it as an option. 
However, it certainly has advantages over a proprietary Google schema. 
So +1 for using that if you intend on implementing this.

Ross

[1] 
http://cocoon.apache.org/2.1/userdocs/generators/xpathdirectory-generator.html

Re: Add support for Googles sitemap protocol?

Posted by Rasik Pandey <rb...@gmail.com>.
> 
> 
> > I was talking about the name "sitemap". Perhaps use "google-map".


Me too. Since, I am currently using a cocoon://abs-linkmap call to generate 
the source for conversion into both the google sitemap format and the rss 
format, I thought linkmap-{sitemap format}.xml might be appropriate.

> Argh, sorry. I have added that to my list of issues
> > to talk to the Cocoon people at ApacheCon.
> 
> > Workarounds? The only one that i can think of is
> > to make an inconspicuous link from one of the docs,
> > e.g. <a href="blah.html">.</a>
> 
> >i.e. not from site.xml or it will create a menu item.


Sure, I was doing that before with some hidden links before your responded 
with the cli.xconf reference ;)

Regards,
Rus
http://www.discountdracula.com

Re: Add support for Googles sitemap protocol?

Posted by David Crossley <cr...@apache.org>.
Rasik Pandey wrote:
> > 
> > > However please do not use the name "sitemap".
> > > We cannot afford confusion between this and
> > > the real Cocoon "sitemap".
> 
> Do you have any naming preferences linkmap-google.xml and linkmap-rss.xml or 
> others?

I was talking about the name "sitemap". Perhaps use "google-map".

> >> I'm afraid I don't recall the answer to this and I am going to bed right
> > >> now. Someone will hopefully answer, but I'm pretty sure it has been
> > >> asked before you might find something in the archives.
> > 
> > http://forrest.apache.org/faq.html#cli-xconf
> 
> Unfortunately there is an annoying bug with this approach, see:
> http://issues.apache.org/jira/browse/FOR-480
> 
> Does anyone have a work-around or patch for this?

Argh, sorry. I have added that to my list of issues
to talk to the Cocoon people at ApacheCon.

Workarounds? The only one that i can think of is
to make an inconspicuous link from one of the docs,
e.g. <a href="blah.html">.</a>

i.e. not from site.xml or it will create a menu item.

-David

Re: Add support for Googles sitemap protocol?

Posted by Rasik Pandey <rb...@gmail.com>.
> 
> 
> > However please do not use the name "sitemap".
> > We cannot afford confusion between this and
> > the real Cocoon "sitemap".


Do you have any naming preferences linkmap-google.xml and linkmap-rss.xml or 
others?

>> I'm afraid I don't recall the answer to this and I am going to bed right
> >> now. Someone will hopefully answer, but I'm pretty sure it has been
> >> asked before you might find something in the archives.
> 
> http://forrest.apache.org/faq.html#cli-xconf


Unfortunately there is an annoying bug with this approach, see:
http://issues.apache.org/jira/browse/FOR-480

Does anyone have a work-around or patch for this?

Regards,
Rus
http://www.discountdracula.com

Re: Add support for Googles sitemap protocol?

Posted by David Crossley <cr...@apache.org>.
Ross Gardler wrote:
> Rasik Pandey wrote:
> 
> >I already have a functioning version of this abs-linkmap --> linkmap.rss 
> >and abs-linkmap --> sitemap.xml, but this , 
> 
> Wow, that would make a cool output plugin (making plugins is really easy 
> if you don't already know how see 
> http://forrest.apache.org/docs_0_70/howto/howto-buildPlugin.html )

However please do not use the name "sitemap".
We cannot afford confusion between this and
the real Cocoon "sitemap".

> >One more question, how can I force Forrest to generate files like 
> >linkmap.rss and/or sitemap.xml without having links to them from my 
> >pages or my site.xml?
> 
> I'm afraid I don't recall the answer to this and I am going to bed right 
> now. Someone will hopefully answer, but I'm pretty sure it has been 
> asked before you might find something in the archives.

http://forrest.apache.org/faq.html#cli-xconf

-David

Re: Add support for Googles sitemap protocol?

Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:

...

> I already have a functioning version of this abs-linkmap --> linkmap.rss 
> and abs-linkmap --> sitemap.xml, but this , 

Wow, that would make a cool output plugin (making plugins is really easy 
if you don't already know how see 
http://forrest.apache.org/docs_0_70/howto/howto-buildPlugin.html )


> One more question, how can I force Forrest to generate files like 
> linkmap.rss and/or sitemap.xml without having links to them from my 
> pages or my site.xml?

I'm afraid I don't recall the answer to this and I am going to bed right 
now. Someone will hopefully answer, but I'm pretty sure it has been 
asked before you might find something in the archives.

Ross

Re: Add support for Googles sitemap protocol?

Posted by Rasik Pandey <rb...@gmail.com>.
> 
> 
> > It's not the format of the document that is a problem, that part is 
> > easy. The hard part is knowing when the page has been regnerated because
> > of a change.


True especially in my case were 90% of my pages are generated from rss or 
xml data feeds funneled through my project sitemap.xmap.

>The minimum required by Google, i.e those marked requried in the following:
> 
> > 
> http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#xmlTagDefinitions<http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#xmlTagDefinitions>



In our case, <loc> and <changefreq> (as long as there is no dependency on the 
value of <lastmod>) would seem to be the best options given we don't have a 
solid solution for the last-modified date. 


> >> Why not use the
> >> 
> http://cocoon.apache.org/2.1/userdocs/transformers/encodeurl-transformer.html
> ?
> 
> > Why not indeed. Thanks for the pointer. 


I already have a functioning version of this abs-linkmap --> linkmap.rss and 
abs-linkmap --> sitemap.xml, but this , 
http://cocoon.apache.org/2.1/userdocs/transformers/augment-transformer.html, 
may be handy for building absolute urls between the dynamic and static 
contexts.

One more question, how can I force Forrest to generate files like 
linkmap.rss and/or sitemap.xml without having links to them from my pages or 
my site.xml?

Regards,
Rus
http://www.discountdracula.com

Re: Add support for Googles sitemap protocol?

Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
> Ross Gardler wrote:
> 
>>> Ferdinand Soethe wrote:
>>> Good point. However, I don't think OAI has a "minimal" form, I did some 
>>> preliminary research into it a few months ago. Let me check it out, I'll 
> 
>>> report back.
>>>
>>> However, I'd still like to see support for Google sitemaps since we can 
>>> do it very quickly and it is more "approachable" than OAI since everyone 
> 
>>> knows Google.
>>>
>>> If we go for the Google format, I'd like to suggest to use slightly
>>> more than the minimum format in this form (as documented in
>>> 
> https://www.google.com/webmasters/sitemaps/docs/en/protocol.html)
>>> 
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <urlset xmlns="
> http://www.google.com/schemas/sitemap/0.84" <http://www.google.com/schemas/sitemap/0.84">>
>>>    <url>
>>>       <loc>http://www.yoursite.com/catalog?item=83&amp;desc=vacation_usa
>  <http://www.yoursite.com/catalog?item=83&amp;desc=vacation_usa></loc>
>>>       <lastmod>2004-11-23</lastmod>
>>>    </url>
>>> </urlset>
>>> 
>>> and include the 'lastmod' right away as that would be the key to speedy
> 
>>> updates. Can we do that?
> 
> Why not use rss2.0 as the format http://www.google.com/webmasters/sitemaps/docs/en/other.html#feed
>  ?

It's not the format of the document that is a problem, that part is 
easy. The hard part is knowing when the page has been regnerated because 
of a change.

>> I'd recomend getting the minimal done, then looking at a way of getting 
>> the lastmod as well.
> 
> What do you consider the minimal? In rss <pubDate> and <link> ?

The minimum required by Google, i.e those marked requried in the following:

http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#xmlTagDefinitions

(or if we used RSS instead whatever is required in that format).

>>> Did you see that Google wants the urls to be url encoded? Does our
> 
>>> XSLT-engine have a function for that?
>>
>> http://www.exslt.org/str/functions/encode-uri/index.html
> 
> Why not use the 
> http://cocoon.apache.org/2.1/userdocs/transformers/encodeurl-transformer.html?

Why not indeed. Thanks for the pointer.

Ross