You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by Rasik Pandey <rb...@gmail.com> on 2005/07/13 20:18:37 UTC
Re: Add support for Googles sitemap protocol?
Ross Gardler wrote:
>> Ferdinand Soethe wrote:
>> Good point. However, I don't think OAI has a "minimal" form, I did some
>> preliminary research into it a few months ago. Let me check it out, I'll
>> report back.
>>
>> However, I'd still like to see support for Google sitemaps since we can
>> do it very quickly and it is more "approachable" than OAI since everyone
>> knows Google.
>>
>> If we go for the Google format, I'd like to suggest to use slightly
>> more than the minimum format in this form (as documented in
>> https://www.google.com/webmasters/sitemaps/docs/en/protocol.html)
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
>> <url>
>> <loc>http://www.yoursite.com/catalog?item=83&desc=vacation_usa</loc>
>> <lastmod>2004-11-23</lastmod>
>> </url>
>> </urlset>
>>
>> and include the 'lastmod' right away as that would be the key to speedy
>> updates. Can we do that?
Why not use rss2.0 as the format
http://www.google.com/webmasters/sitemaps/docs/en/other.html#feed ?
> I'd recomend getting the minimal done, then looking at a way of getting
> the lastmod as well.
What do you consider the minimal? In rss <pubDate> and <link> ?
>> Did you see that Google wants the urls to be url encoded? Does our
>> XSLT-engine have a function for that?
>
> http://www.exslt.org/str/functions/encode-uri/index.html
Why not use the
http://cocoon.apache.org/2.1/userdocs/transformers/encodeurl-transformer.html?
Regards,
Rus
http://www.discountdracula.com
Re: Add support for Googles sitemap protocol?
Posted by Thorsten Scherler <th...@apache.org>.
On Wed, 2005-07-13 at 23:30 +0100, Ross Gardler wrote:
> Ross Gardler wrote:
> > Rasik Pandey wrote:
>
> ...
>
> >>>> and include the 'lastmod' right away as that would be the key to speedy
> >>
> >>
> >>>> updates. Can we do that?
> >>
> >>
> >> Why not use rss2.0 as the format
> >> http://www.google.com/webmasters/sitemaps/docs/en/other.html#feed
> >> ?
> >
> >
> > It's not the format of the document that is a problem, that part is
> > easy. The hard part is knowing when the page has been regnerated because
> > of a change.
> >
>
> (identifying a potential solution to the problem I identified here...)
>
> Perhaps you can use the XPathDirectoryGenerator [1] to identify when
> files were last modified?
>
...or
<map:generator
name="traverse"
src="org.apache.cocoon.generation.TraversableGenerator"
logger="sitemap.generator.traverse"
label="content"
pool-max="16"
/>
In sitemap.xmap:
<map:generate type="traverse" src="{project:content.xdocs}"/>
gives:
<collection:collection
xmlns:collection="http://apache.org/cocoon/collection/1.0"
name="xdocs"
uri="file:/home/thorsten/src/newSeed/src/documentation/content/xdocs/"
lastModified="1121125614000" date="7/12/05 1:46 AM" size="4096"
sort="name"
reverse="false" requested="true">
<collection:collection name="images"
uri="file:/home/thorsten/src/newSeed/src/documentation/content/xdocs/images/"
lastModified="1121121526000" date="7/12/05 12:38 AM" size="4096">
<collection:resource name="group-logo.gif"
uri="file:/home/thorsten/src/newSeed/src/documentation/content/xdocs/images/group-logo.gif"
lastModified="1121121525000" date="7/12/05 12:38 AM"
size="1092"/>
</collection:collection>
</collection:collection>
http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/generation/TraversableGenerator.html
http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/generation/XPathTraversableGenerator.html
> As for using RSS, when we originally discussed this RSS was not one of
> the supported formats, hence we did not discuss it as an option.
> However, it certainly has advantages over a proprietary Google schema.
> So +1 for using that if you intend on implementing this.
>
+1
salu2
> Ross
>
> [1]
> http://cocoon.apache.org/2.1/userdocs/generators/xpathdirectory-generator.html
--
thorsten
"Together we stand, divided we fall!"
Hey you (Pink Floyd)
Re: Add support for Googles sitemap protocol?
Posted by Rasik Pandey <rb...@gmail.com>.
Just put the first version, 0.1-dev, which works with Forrest 0.7 into JIRA.
Ignore the first attachment.
See:
http://issues.apache.org/jira/browse/FOR-597
--
Regards,
Rus
www.discountdracula
<http://www.discountdracula.com>.com<http://www.discountdracula.com>
"Your Bargain BloodSucka:
Suckin' the Best Deals Outta the Web"
Re: Add support for Googles sitemap protocol?
Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
>
> > To answer you specific question. Anything defined in a plugin sitemap
> > (internal or otherwides) has the same access limitations that you will
> > find in any Cocoon sitemap. That means:
>
> Which means views are bound to the sitemap in which they are defined and
> leaves us two options, redeclare all pertinent pipelines in the
> internal.xmap
>
> OR add these two pieces to root Forrest sitemap.xmap:
>
> <map:serializers>
> <map:serializer name="links"
> src="org.apache.cocoon.serialization.LinkSerializer">
> <encoding>ISO-8859-1</encoding>
> </map:serializer>
> </map:serializers>
>
> <map:views>
> <map:view name="links" from-position="last">
> <map:serialize type="links"/>
> </map:view>
> </map:views>
>
> So the LinkStatusGenerator will be able to access the requisite
> information. I am NOT a fan of either approach :( as neither is a neat
> implementation.
I agree neither solution is neat. For version 0.1 of your plugin I would
redefine all pertinent pipelines in the internal.xmap. The longer term
solution will be to use the shiny new sitemap block mounting mechanism
once we can upgrade Cocoon in Forrest. As I think I mentioned before
this allows a level of inheritance in sitemaps, complete with the
ability to override items in the super sitemaps. It even allows for
multiple inheritance.
> > What properties do you want?
> >
> > As a hint you can access most of the properties with {project:foo} where
> > foo is defined within forrest.xconf
>
> You mean within the "forrest.properties" file and NOT the
> "forrest.xconf" file, right?
No, I meant forrest.xconf, but my response was certainly confusing. Let
me try and explain.
Within forrest.xconf there are a properties set that can be accessed
with {forrest:foo} and another set that are accessed with {project:bar}.
The values of these properties are (in some cases) set in
forrest.properties. In other words the user is not supposed to edit
forrest.xconf
See the element <component-instance name="defaults"
class="org.apache.forrest.conf.ForrestConfModule"> for the {forrest:foo}
properties and <component-instance name="project"
class="org.apache.forrest.conf.ForrestConfModule"> for the {project:foo}
properties.
The @foo.bar@ are tags that are replaced by Ant when Forrest is
launched, but in the main they have the same names as properties set in
forrest.properties.
Ross
Re: Add support for Googles sitemap protocol?
Posted by Rasik Pandey <rb...@gmail.com>.
> To answer you specific question. Anything defined in a plugin sitemap
> (internal or otherwides) has the same access limitations that you will
> find in any Cocoon sitemap. That means:
Which means views are bound to the sitemap in which they are defined and
leaves us two options, redeclare all pertinent pipelines in the
internal.xmap
OR add these two pieces to root Forrest sitemap.xmap:
<map:serializers>
<map:serializer name="links" src="
org.apache.cocoon.serialization.LinkSerializer">
<encoding>ISO-8859-1</encoding>
</map:serializer>
</map:serializers>
<map:views>
<map:view name="links" from-position="last">
<map:serialize type="links"/>
</map:view>
</map:views>
So the LinkStatusGenerator will be able to access the requisite information.
I am NOT a fan of either approach :( as neither is a neat implementation.
> What properties do you want?
>
> As a hint you can access most of the properties with {project:foo} where
> foo is defined within forrest.xconf
You mean within the "forrest.properties" file and NOT the "forrest.xconf"
file, right?
That is what I am currently using. I declared a
#project.siteBaseURI=http://www.discountdracula.com
and use it as you described in my internal.xmap.
> Looks like you are stuck with a standard internal plugin for th
> eforseeable future.
Tough luck...
--
Regards,
Rus
www.discountdracula.com <http://www.discountdracula.com>
"Your Bargain BloodSucka:
Suckin' the Best Deals Outta the Web"
Re: Add support for Googles sitemap protocol?
Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
> Hi Ross,
>
> > Agghhhh!!! There's that term "views" again.
> >
> > We have a real problem here in Forrest at the moment. Views are being
> > used to refer to two different things (views in Eclipse and views, the
> > replacement for skins). Now we seem to have a third use, I assume this
> > is sitemap views (funnily enough I said this may become a conflict in
> > another mail earlier today, never thought it would already have happened
> > in my inbox).
>
> So with an internal plugin (internal.xmap) will the views (cocoon
> map:views) be propagated downard to the main Forrest sitemap? I am new
> to "plugins" so if my question contains misconceptions, please let me
> know. Or are views still orthogonal(bound) to specific sitemaps as in
> Cocoon?
Hehe, you see the confusion we are causing using views to refer to so
many different things (are you listening Thorsten?)
To answer you specific question. Anything defined in a plugin sitemap
(internal or otherwides) has the same access limitations that you will
find in any Cocoon sitemap. That means:
if you use cocon:/ it will only search within the root sitemap (i.e.
Forrests sitemap.xmap)
if you use cocoon:// it will search in all subsitemaps starting from the
root sitemap
> > I'm going to delay responding to this because one of the things that
> > happened at the Hackathon is that Ferdinand looked at an alternative
> > method of identifying which pages were regenerated in a run. Perhaps we
> > should wait to see if he thinks it can be applied here.
>
> Any news on this front?
Ferdinand is away for a week. We won't hear anything until he returns
sometime next week.
> > Internal plugins are for this kind of thing. However, as they stand they
> > don't make the code much more maintainable.
>
> I think I have the most basic functionality (not tested) implemented in
> an internal plugin. What is the simplest means of pulling values from a
> plugin's forrest.properties and accessing them in the internal.xmap? Is
> there an InputModule which reads .properties files?
What properties do you want?
As a hint you can access most of the properties with {project:foo} where
foo is defined within forrest.xconf
> > A possible solution is to use Cocoons new block sitemap loading
> > features. This was demonstrated to me at the Hackathon (thanks Daniel),
> > it provides a way for plugins to extend existing sitemaps (and other
> > plugins). I need to do some experimentation with this so you should
> > proceed with an internal plugin for now, we'll address the
> > maintainability when experiments are complete.
>
> Ok thanks.
Unfortunately there is a problem with the CLI in Cocoon Head,
consequently we cannot update Cocoon in Forrest just yet, therefore I
can't do the above experimentation yet. Cheche is woring with the Cocoon
folk to get the CLI sorted out.
Looks like you are stuck with a standard internal plugin for th
eforseeable future.
Ross
Re: Add support for Googles sitemap protocol?
Posted by Rasik Pandey <rb...@gmail.com>.
Hi Ross,
> Agghhhh!!! There's that term "views" again.
>
> We have a real problem here in Forrest at the moment. Views are being
> used to refer to two different things (views in Eclipse and views, the
> replacement for skins). Now we seem to have a third use, I assume this
> is sitemap views (funnily enough I said this may become a conflict in
> another mail earlier today, never thought it would already have happened
> in my inbox).
So with an internal plugin (internal.xmap) will the views (cocoon map:views)
be propagated downard to the main Forrest sitemap? I am new to "plugins" so
if my question contains misconceptions, please let me know. Or are views
still orthogonal(bound) to specific sitemaps as in Cocoon?
> I'm going to delay responding to this because one of the things that
> happened at the Hackathon is that Ferdinand looked at an alternative
> method of identifying which pages were regenerated in a run. Perhaps we
> should wait to see if he thinks it can be applied here.
Any news on this front?
> Internal plugins are for this kind of thing. However, as they stand they
> don't make the code much more maintainable.
I think I have the most basic functionality (not tested) implemented in an
internal plugin. What is the simplest means of pulling values from a
plugin's forrest.properties and accessing them in the internal.xmap? Is
there an InputModule which reads .properties files?
> A possible solution is to use Cocoons new block sitemap loading
> features. This was demonstrated to me at the Hackathon (thanks Daniel),
> it provides a way for plugins to extend existing sitemaps (and other
> plugins). I need to do some experimentation with this so you should
> proceed with an internal plugin for now, we'll address the
> maintainability when experiments are complete.
Ok thanks.
--
Regards,
Rus
www.discountdracula
<http://www.discountdracula.com>.com<http://www.discountdracula.com>
"Your Bargain BloodSucka:
Suckin' the Best Deals Outta the Web"
Re: Add support for Googles sitemap protocol?
Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
> Hello,
>
> Ross wrote:
> >>http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.java
> <http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.java>
> > > would do the trick, although it would have to be modified to make a
> call
> > > to get the "last-modified" header, so hopefully we could get that added
> > > to a future release of cocoon. With a quick examination of the
> code, it
> > > looks like it will crawl a URL and generate an xml report, allowing
> > > includes and excludes expressions.
> >
> > We can create a new generator that extands the LinkStatusGenerator and
> > house it here. If Cocoon want it we will remove it from here at a later
> > date.
>
> I have a few issues with the approach which I proposed, hopefully you
> can help me make some sense about my reservations. First, since views
> are orthogonal to each sitemap the cocoon-view=links required by the
> LinkStatusGenerator and therefore must be declared in the parent sitemap
> which does the core matching.
Agghhhh!!! There's that term "views" again.
We have a real problem here in Forrest at the moment. Views are being
used to refer to two different things (views in Eclipse and views, the
replacement for skins). Now we seem to have a third use, I assume this
is sitemap views (funnily enough I said this may become a conflict in
another mail earlier today, never thought it would already have happened
in my inbox).
I'm going to delay responding to this because one of the things that
happened at the Hackathon is that Ferdinand looked at an alternative
method of identifying which pages were regenerated in a run. Perhaps we
should wait to see if he thinks it can be applied here.
> Secondly, in building such a plugin for
> forrest, the plugin-sitemap would have to override/redeclare a number of
> pipeline match(s) in order to have access/provide to the necessary
> request header,"last-modified", as there doesn't seem to be a generic
> means for providing this information and passing it up to parent
> pipeline match(s) so they are conditionally added to the request
> headers. Overriding many pipelineThis doesn't seem very maintainable.
> Would anyone have any ideas on how to achieve this or an alternative
> approach?
Internal plugins are for this kind of thing. However, as they stand they
don't make the code much more maintainable.
A possible solution is to use Cocoons new block sitemap loading
features. This was demonstrated to me at the Hackathon (thanks Daniel),
it provides a way for plugins to extend existing sitemaps (and other
plugins). I need to do some experimentation with this so you should
proceed with an internal plugin for now, we'll address the
maintainability when experiments are complete.
Ross
Re: Add support for Googles sitemap protocol?
Posted by Rasik Pandey <rb...@gmail.com>.
Hello,
Ross wrote:
>>
http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.java
> > would do the trick, although it would have to be modified to make a call
> > to get the "last-modified" header, so hopefully we could get that added
> > to a future release of cocoon. With a quick examination of the code, it
> > looks like it will crawl a URL and generate an xml report, allowing
> > includes and excludes expressions.
>
> We can create a new generator that extands the LinkStatusGenerator and
> house it here. If Cocoon want it we will remove it from here at a later
> date.
I have a few issues with the approach which I proposed, hopefully you can
help me make some sense about my reservations. First, since views are
orthogonal to each sitemap the cocoon-view=links required by the
LinkStatusGenerator and therefore must be declared in the parent sitemap
which does the core matching. Secondly, in building such a plugin for
forrest, the plugin-sitemap would have to override/redeclare a number of
pipeline match(s) in order to have access/provide to the necessary request
header,"last-modified", as there doesn't seem to be a generic means for
providing this information and passing it up to parent pipeline match(s) so
they are conditionally added to the request headers. Overriding many
pipelineThis doesn't seem very maintainable. Would anyone have any ideas on
how to achieve this or an alternative approach?
> This should not be configured from skinconf.xml. There a few reasons for
> this, firstly it has nothing to do with the skin, which is about the
> look and feel of the site. Secondly because skins (and therefore
> skinconf.xml) are being deprecated in 0.8 in favour of views. Finally
> this should be an output plugin and therefore needs to be configured
> from the plugin.
Point taken.
> The variables in the sitemap, such as {project:stylesheets} are defined
> in forrest.xconf and are given values from forrest.properties during Ant
> script (the init target if I remember correctly). However, since this is
> a plugin we do not want to be adding new config values to
> forrest.properties. So again, the config needs to be in the plugin.
Agreed...
> How do you do that?
>
> We don't know yet. We have discussed it quite a few times but have not
> yet come up with a final solution. Although Thorstens recent work on
> View configuratoin has made some of the options we talked about possible.
>
> Since we don't yet know a solution for this perhaps we can work the
> other way around. When you get to the point of needing to add these
> configurations tell us exactly what you need to do and we will use it as
> a use case for defining the per plugin configs.
Will do, thanks.
--
Regards,
Rus
www.discountdracula.com <http://www.discountdracula.com>
Re: Add support for Googles sitemap protocol?
Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
> > This is a good point. How about also also providing a generator that
> > would get the last modified header of remote resources. The results of
> > the two could be aggregated together.
>
> I think
> http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.java
> <http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.java>
> would do the trick, although it would have to be modified to make a call
> to get the "last-modified" header, so hopefully we could get that added
> to a future release of cocoon. With a quick examination of the code, it
> looks like it will crawl a URL and generate an xml report, allowing
> includes and excludes expressions.
We can create a new generator that extands the LinkStatusGenerator and
house it here. If Cocoon want it we will remove it from here at a later
date.
> > However, this still is not totally robust, becayse some remote resources
> > will always indicate that they have changed even when the content has
> > not (for example Daisy tracks changes to meta-data that Forrest does not
> > currently use).
>
> What strategy do you propose to handle this case if any?
This is a special case. I would not worry about it just yet. The Daisy
plugin is still in the whiteboard anyway. In fact the one that is in SVN
right now would work with the above approach, it is the one on my hard
drive that would have a problem.
> >> I may need some assistance to know how to build in hooks from
> >> skinconf.xml to the sitemap format generation.
>
> > I'm not sure what you mean by that. But there are plenty of people here
> > to answer your questions as they arise.
>
> I am sure there will be a need to allow users to specify a configuration
> for this like the includes/excludes on the LinkStatusGenerator crawls
> and maybe the <changefreq> value for the google sitemap format. Can you
> give me a quick overview of how params make it from the skinconf.xml to
> the sitemap(s) or xsl(s)?
This should not be configured from skinconf.xml. There a few reasons for
this, firstly it has nothing to do with the skin, which is about the
look and feel of the site. Secondly because skins (and therefore
skinconf.xml) are being deprecated in 0.8 in favour of views. Finally
this should be an output plugin and therefore needs to be configured
from the plugin.
The variables in the sitemap, such as {project:stylesheets} are defined
in forrest.xconf and are given values from forrest.properties during Ant
script (the init target if I remember correctly). However, since this is
a plugin we do not want to be adding new config values to
forrest.properties. So again, the config needs to be in the plugin.
How do you do that?
We don't know yet. We have discussed it quite a few times but have not
yet come up with a final solution. Although Thorstens recent work on
View configuratoin has made some of the options we talked about possible.
Since we don't yet know a solution for this perhaps we can work the
other way around. When you get to the point of needing to add these
configurations tell us exactly what you need to do and we will use it as
a use case for defining the per plugin configs.
Ross
Re: Add support for Googles sitemap protocol?
Posted by Rasik Pandey <rb...@gmail.com>.
Hi Ross,
> This is a good point. How about also also providing a generator that
> would get the last modified header of remote resources. The results of
> the two could be aggregated together.
I think
http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.javawould
do the trick, although it would have to be modified to make a call to
get the "last-modified" header, so hopefully we could get that added to a
future release of cocoon. With a quick examination of the code, it looks
like it will crawl a URL and generate an xml report, allowing includes and
excludes expressions.
> However, this still is not totally robust, becayse some remote resources
> will always indicate that they have changed even when the content has
> not (for example Daisy tracks changes to meta-data that Forrest does not
> currently use).
What strategy do you propose to handle this case if any?
>> Are you familiar with
>>
http://cocoon.apache.org/2.1/userdocs/generators/linkstatus-generator.html
>> , the documentation is skimpy, but it may be what we need to handle both
>> static and dynamic cases.
> No I'm not familiar. I wonder what the docs mean by "status". Will it
> provide the last modified header as suggested above?
See above...
> I don't have the time to experiment with it now, but I (and I am sure>
other devs) would love to hear about your findings.
See above...
>> I may need some assistance to know how to build in hooks from
>> skinconf.xml to the sitemap format generation.
> I'm not sure what you mean by that. But there are plenty of people here
> to answer your questions as they arise.
I am sure there will be a need to allow users to specify a configuration for
this like the includes/excludes on the LinkStatusGenerator crawls and maybe
the <changefreq> value for the google sitemap format. Can you give me a
quick overview of how params make it from the skinconf.xml to the sitemap(s)
or xsl(s)?
Regards,
Rus
http://www.discountdracula.com
Re: Add support for Googles sitemap protocol?
Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
> >> It's not the format of the document that is a problem, that part is
> >> easy. The hard part is knowing when the page has been regnerated
> because
> >> of a change.
> >>
>
> (identifying a potential solution to the problem I identified here...)
>
> > Perhaps you can use the XPathDirectoryGenerator [1] to identify when
> > files were last modified?
>
>
> My case uses dynamically generated xml files, so I don't think this is a
> robust solution...?
This is a good point. How about also also providing a generator that
would get the last modified header of remote resources. The results of
the two could be aggregated together.
However, this still is not totally robust, becayse some remote resources
will always indicate that they have changed even when the content has
not (for example Daisy tracks changes to meta-data that Forrest does not
currently use).
> Are you familiar with
> http://cocoon.apache.org/2.1/userdocs/generators/linkstatus-generator.html
> , the documentation is skimpy, but it may be what we need to handle both
> static and dynamic cases.
No I'm not familiar. I wonder what the docs mean by "status". Will it
provide the last modified header as suggested above?
I don't have the time to experiment with it now, but I (and I am sure
other devs) would love to hear about your findings.
> I think both the google and RSS formats are simple enough to provide.
> Although, I may need some assistance to know how to build in hooks from
> skinconf.xml to the sitemap format generation.
I'm not sure what you mean by that. But there are plenty of people here
to answer your questions as they arise.
Ross
Re: Add support for Googles sitemap protocol?
Posted by Rasik Pandey <rb...@gmail.com>.
>
> >> It's not the format of the document that is a problem, that part is
> >> easy. The hard part is knowing when the page has been regnerated
> because
> >> of a change.
> >>
(identifying a potential solution to the problem I identified here...)
>
> > Perhaps you can use the XPathDirectoryGenerator [1] to identify when
> > files were last modified?
My case uses dynamically generated xml files, so I don't think this is a
robust solution...? Are you familiar with
http://cocoon.apache.org/2.1/userdocs/generators/linkstatus-generator.html ,
the documentation is skimpy, but it may be what we need to handle both
static and dynamic cases.
> As for using RSS, when we originally discussed this RSS was not one of
> > the supported formats, hence we did not discuss it as an option.
> > However, it certainly has advantages over a proprietary Google schema.
> > So +1 for using that if you intend on implementing this.
I think both the google and RSS formats are simple enough to provide.
Although, I may need some assistance to know how to build in hooks from
skinconf.xml to the sitemap format generation.
Regards,
Rus
http://www.discountdracula.com
Re: Add support for Googles sitemap protocol?
Posted by Ross Gardler <rg...@apache.org>.
Ross Gardler wrote:
> Rasik Pandey wrote:
...
>>>> and include the 'lastmod' right away as that would be the key to speedy
>>
>>
>>>> updates. Can we do that?
>>
>>
>> Why not use rss2.0 as the format
>> http://www.google.com/webmasters/sitemaps/docs/en/other.html#feed
>> ?
>
>
> It's not the format of the document that is a problem, that part is
> easy. The hard part is knowing when the page has been regnerated because
> of a change.
>
(identifying a potential solution to the problem I identified here...)
Perhaps you can use the XPathDirectoryGenerator [1] to identify when
files were last modified?
As for using RSS, when we originally discussed this RSS was not one of
the supported formats, hence we did not discuss it as an option.
However, it certainly has advantages over a proprietary Google schema.
So +1 for using that if you intend on implementing this.
Ross
[1]
http://cocoon.apache.org/2.1/userdocs/generators/xpathdirectory-generator.html
Re: Add support for Googles sitemap protocol?
Posted by Rasik Pandey <rb...@gmail.com>.
>
>
> > I was talking about the name "sitemap". Perhaps use "google-map".
Me too. Since, I am currently using a cocoon://abs-linkmap call to generate
the source for conversion into both the google sitemap format and the rss
format, I thought linkmap-{sitemap format}.xml might be appropriate.
> Argh, sorry. I have added that to my list of issues
> > to talk to the Cocoon people at ApacheCon.
>
> > Workarounds? The only one that i can think of is
> > to make an inconspicuous link from one of the docs,
> > e.g. <a href="blah.html">.</a>
>
> >i.e. not from site.xml or it will create a menu item.
Sure, I was doing that before with some hidden links before your responded
with the cli.xconf reference ;)
Regards,
Rus
http://www.discountdracula.com
Re: Add support for Googles sitemap protocol?
Posted by David Crossley <cr...@apache.org>.
Rasik Pandey wrote:
> >
> > > However please do not use the name "sitemap".
> > > We cannot afford confusion between this and
> > > the real Cocoon "sitemap".
>
> Do you have any naming preferences linkmap-google.xml and linkmap-rss.xml or
> others?
I was talking about the name "sitemap". Perhaps use "google-map".
> >> I'm afraid I don't recall the answer to this and I am going to bed right
> > >> now. Someone will hopefully answer, but I'm pretty sure it has been
> > >> asked before you might find something in the archives.
> >
> > http://forrest.apache.org/faq.html#cli-xconf
>
> Unfortunately there is an annoying bug with this approach, see:
> http://issues.apache.org/jira/browse/FOR-480
>
> Does anyone have a work-around or patch for this?
Argh, sorry. I have added that to my list of issues
to talk to the Cocoon people at ApacheCon.
Workarounds? The only one that i can think of is
to make an inconspicuous link from one of the docs,
e.g. <a href="blah.html">.</a>
i.e. not from site.xml or it will create a menu item.
-David
Re: Add support for Googles sitemap protocol?
Posted by Rasik Pandey <rb...@gmail.com>.
>
>
> > However please do not use the name "sitemap".
> > We cannot afford confusion between this and
> > the real Cocoon "sitemap".
Do you have any naming preferences linkmap-google.xml and linkmap-rss.xml or
others?
>> I'm afraid I don't recall the answer to this and I am going to bed right
> >> now. Someone will hopefully answer, but I'm pretty sure it has been
> >> asked before you might find something in the archives.
>
> http://forrest.apache.org/faq.html#cli-xconf
Unfortunately there is an annoying bug with this approach, see:
http://issues.apache.org/jira/browse/FOR-480
Does anyone have a work-around or patch for this?
Regards,
Rus
http://www.discountdracula.com
Re: Add support for Googles sitemap protocol?
Posted by David Crossley <cr...@apache.org>.
Ross Gardler wrote:
> Rasik Pandey wrote:
>
> >I already have a functioning version of this abs-linkmap --> linkmap.rss
> >and abs-linkmap --> sitemap.xml, but this ,
>
> Wow, that would make a cool output plugin (making plugins is really easy
> if you don't already know how see
> http://forrest.apache.org/docs_0_70/howto/howto-buildPlugin.html )
However please do not use the name "sitemap".
We cannot afford confusion between this and
the real Cocoon "sitemap".
> >One more question, how can I force Forrest to generate files like
> >linkmap.rss and/or sitemap.xml without having links to them from my
> >pages or my site.xml?
>
> I'm afraid I don't recall the answer to this and I am going to bed right
> now. Someone will hopefully answer, but I'm pretty sure it has been
> asked before you might find something in the archives.
http://forrest.apache.org/faq.html#cli-xconf
-David
Re: Add support for Googles sitemap protocol?
Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
...
> I already have a functioning version of this abs-linkmap --> linkmap.rss
> and abs-linkmap --> sitemap.xml, but this ,
Wow, that would make a cool output plugin (making plugins is really easy
if you don't already know how see
http://forrest.apache.org/docs_0_70/howto/howto-buildPlugin.html )
> One more question, how can I force Forrest to generate files like
> linkmap.rss and/or sitemap.xml without having links to them from my
> pages or my site.xml?
I'm afraid I don't recall the answer to this and I am going to bed right
now. Someone will hopefully answer, but I'm pretty sure it has been
asked before you might find something in the archives.
Ross
Re: Add support for Googles sitemap protocol?
Posted by Rasik Pandey <rb...@gmail.com>.
>
>
> > It's not the format of the document that is a problem, that part is
> > easy. The hard part is knowing when the page has been regnerated because
> > of a change.
True especially in my case were 90% of my pages are generated from rss or
xml data feeds funneled through my project sitemap.xmap.
>The minimum required by Google, i.e those marked requried in the following:
>
> >
> http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#xmlTagDefinitions<http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#xmlTagDefinitions>
In our case, <loc> and <changefreq> (as long as there is no dependency on the
value of <lastmod>) would seem to be the best options given we don't have a
solid solution for the last-modified date.
> >> Why not use the
> >>
> http://cocoon.apache.org/2.1/userdocs/transformers/encodeurl-transformer.html
> ?
>
> > Why not indeed. Thanks for the pointer.
I already have a functioning version of this abs-linkmap --> linkmap.rss and
abs-linkmap --> sitemap.xml, but this ,
http://cocoon.apache.org/2.1/userdocs/transformers/augment-transformer.html,
may be handy for building absolute urls between the dynamic and static
contexts.
One more question, how can I force Forrest to generate files like
linkmap.rss and/or sitemap.xml without having links to them from my pages or
my site.xml?
Regards,
Rus
http://www.discountdracula.com
Re: Add support for Googles sitemap protocol?
Posted by Ross Gardler <rg...@apache.org>.
Rasik Pandey wrote:
> Ross Gardler wrote:
>
>>> Ferdinand Soethe wrote:
>>> Good point. However, I don't think OAI has a "minimal" form, I did some
>>> preliminary research into it a few months ago. Let me check it out, I'll
>
>>> report back.
>>>
>>> However, I'd still like to see support for Google sitemaps since we can
>>> do it very quickly and it is more "approachable" than OAI since everyone
>
>>> knows Google.
>>>
>>> If we go for the Google format, I'd like to suggest to use slightly
>>> more than the minimum format in this form (as documented in
>>>
> https://www.google.com/webmasters/sitemaps/docs/en/protocol.html)
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <urlset xmlns="
> http://www.google.com/schemas/sitemap/0.84" <http://www.google.com/schemas/sitemap/0.84">>
>>> <url>
>>> <loc>http://www.yoursite.com/catalog?item=83&desc=vacation_usa
> <http://www.yoursite.com/catalog?item=83&desc=vacation_usa></loc>
>>> <lastmod>2004-11-23</lastmod>
>>> </url>
>>> </urlset>
>>>
>>> and include the 'lastmod' right away as that would be the key to speedy
>
>>> updates. Can we do that?
>
> Why not use rss2.0 as the format http://www.google.com/webmasters/sitemaps/docs/en/other.html#feed
> ?
It's not the format of the document that is a problem, that part is
easy. The hard part is knowing when the page has been regnerated because
of a change.
>> I'd recomend getting the minimal done, then looking at a way of getting
>> the lastmod as well.
>
> What do you consider the minimal? In rss <pubDate> and <link> ?
The minimum required by Google, i.e those marked requried in the following:
http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#xmlTagDefinitions
(or if we used RSS instead whatever is required in that format).
>>> Did you see that Google wants the urls to be url encoded? Does our
>
>>> XSLT-engine have a function for that?
>>
>> http://www.exslt.org/str/functions/encode-uri/index.html
>
> Why not use the
> http://cocoon.apache.org/2.1/userdocs/transformers/encodeurl-transformer.html?
Why not indeed. Thanks for the pointer.
Ross