You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Conrad CRAMPTON PSE 52704 <co...@kent.pnn.police.uk> on 2003/10/10 11:56:49 UTC

LinkSerializer, views, spaces and lucene

Hi,
(Using coccon 2.1.1 on tomcat 3.3)

I have an index.xml file that comes is produced from a content
management system that is an index of all files in a particular
directory which includes file names as url's. 

viz
<?xml version="1.0"?>

<index>
<newsitem><link href="9901011200 Latest News - Home.html" />
<heading>Latest News - Home</heading>
<publish-date>16 April 2003 </publish-date></newsitem><newsitem><link
href="0304151036 boilers.html" />
<heading>Police warn of dangers posed by stolen boilers</heading>
<publish-date>15 April 2003</publish-date></newsitem><newsitem><link
href="0304150952 misper Fonkou.html" />
<heading>North Kent police search for missing man</heading>
<publish-date>15 April 2003</publish-date></newsitem><newsitem><link
href="0304141322 MS.html" />
<heading>Bluewater police and Marks &amp; Spencer team up in
property-marking campaign</heading>
<publish-date>15 April 2003</publish-date></newsitem> ......

The hrefs contain spaces in them as they are derived from the original
file name of a word document (before going into a translation phase -
not important). 

It appears that these spaces are causing problems with the link view
and LinkSerializer when trying to create an index for searching using
the sample lucene create an index page. I believe it is this as when I
manually put in %20 for the spaces in index.xml it works also I get
these errors in the sitemap log file....

INFO    (2003-10-10) 09:32.44:735   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/SerializeNode: Jumping to view links from serializer at
file:/C:/tomcat/webapps/cocoon/samples/newforceweb/sitemap.xmap:94:32
WARN    (2003-10-10) 09:32.45:055   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=script
RAW=script ATT=src NS= VALUE=js/slide_menu.js
WARN    (2003-10-10) 09:32.45:065   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=link
RAW=link ATT=href NS= VALUE=css/kent_int.css
WARN    (2003-10-10) 09:32.45:065   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=area
RAW=area ATT=href NS= VALUE=../../home.html
WARN    (2003-10-10) 09:32.45:065   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=img
RAW=img ATT=src NS= VALUE=newimages/blackpixel.gif
WARN    (2003-10-10) 09:32.45:075   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=script
RAW=script ATT=src NS= VALUE=js/menu_links.js
WARN    (2003-10-10) 09:32.45:075   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=img
RAW=img ATT=src NS= VALUE=../../images/blackpixel.gif
WARN    (2003-10-10) 09:32.45:085   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=img
RAW=img ATT=src NS= VALUE=../newimages/whitepixel.gif
WARN    (2003-10-10) 09:32.45:085   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=9901011200 Latest News - Home.html
WARN    (2003-10-10) 09:32.45:085   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304151036 boilers.html
WARN    (2003-10-10) 09:32.45:085   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304150952 misper Fonkou.html
WARN    (2003-10-10) 09:32.45:085   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304141322 MS.html
WARN    (2003-10-10) 09:32.45:095   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304111642 school.html
WARN    (2003-10-10) 09:32.45:095   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304111528 Medway e-fit.html
WARN    (2003-10-10) 09:32.45:095   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304111347 amnesty results.html
WARN    (2003-10-10) 09:32.45:095   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304101542 lathe.html
WARN    (2003-10-10) 09:32.45:095   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304101537 sheds.html
WARN    (2003-10-10) 09:32.45:095   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304101525 Spring.html
WARN    (2003-10-10) 09:32.45:095   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304101311 ASBOs.html
WARN    (2003-10-10) 09:32.45:095   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304091050 doorstep.html
WARN    (2003-10-10) 09:32.45:095   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=0304081024 Ford Granada.html
WARN    (2003-10-10) 09:32.45:105   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=img
RAW=img ATT=src NS= VALUE=../..images/blackpixel.gif
WARN    (2003-10-10) 09:32.45:105   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=a RAW=a
ATT=href NS= VALUE=#top
WARN    (2003-10-10) 09:32.45:105   [sitemap]
(/cocoon/samples/newforceweb/news/current/9901011200%20Latest%20News%20-%20Home.html)
Thread-24/ExtendedXLinkPipe: Possible internal error: URI= NAME=img
RAW=img ATT=src NS= VALUE=../newimages/blackpixel.gif
WARN    (2003-10-10) 09:57.47:766   [sitemap]
(/cocoon/samples/newforceweb/oldhome.html)
Thread-24/ExcaliburComponentManager: disposing of handler for unreleased
component. role [org.apache.cocoon.serialization.SerializerSelector]
WARN    (2003-10-10) 09:57.47:766   [sitemap]
(/cocoon/samples/newforceweb/oldhome.html)
Thread-24/ExcaliburComponentManager: disposing of handler for unreleased
component. role [org.apache.cocoon.generation.GeneratorSelector]
WARN    (2003-10-10) 09:57.47:766   [sitemap]
(/cocoon/samples/newforceweb/oldhome.html)
Thread-24/ExcaliburComponentManager: disposing of handler for unreleased
component. role [org.apache.cocoon.matching.MatcherSelector]
WARN    (2003-10-10) 09:57.47:766   [sitemap]
(/cocoon/samples/newforceweb/oldhome.html)
Thread-24/ExcaliburComponentManager: disposing of handler for unreleased
component. role [org.apache.cocoon.transformation.TransformerSelector]

Ideas for resolving this have been to try and transform the resulting
links from the view as defined in my sitemap
		
		<map:view name="links" from-position="last">
			<map:serialize type="links"/>
***** <map:transform src="index.xsl" />  ******       
		</map:view>

or hack LinkSerializer to replace spaces with either %20 or '+'

or beat up users who create these files with spaces in the file names -
preferred option personally, but possible side effects of job loss ;-)

Sorry for the length of email.
Conrad