You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lenya.apache.org by Andreas Hartmann <an...@apache.org> on 2008/11/20 11:30:05 UTC
Re: [Feedmodule] How to declare an entity in a Java transformer
?
Hi André,
Florent André schrieb:
> I would like to parse localy downloaded (via <xi:include parse="text">)
> html pages.
I'm afraid this approach will only cause a lot of headache. I'd rather
recommend to use the HTMLGenerator [1] to parse the files. In your
XInclude statement you can just call the HTMLGenerator pipeline using
the cocoon:/ protocol.
[1] http://cocoon.apache.org/2.1/userdocs/html-generator.html
HTH,
-- Andreas
>
> After download, <xi:include> give me an "escape" html file.
>
> I suppress <!Doctype ... > with regex, but now the unescape transformer
> throw this error :
> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
> referenced, but not declared."
>
> I found this on internet : "To allow the use off   in you stylesheet,
> you have to declare it first : <!DOCTYPE xsl:stylesheet [<!ENTITY nbsp
> " ">]> "
>
> How I can add this declaration in the java unescape transformer ?
>
> I think that I can remove all   with a regex, but I would like to more
> understand how work java transformer.
>
> Thanks and have a good day.
>
> Florent
--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org
Escape string for xml (was Re: [Feedmodule] How to declare an
entity in a Java transformer ?)
Posted by Thorsten Scherler <th...@apache.org>.
On Fri, 2009-03-13 at 11:32 +0100, Florent André wrote:
...
> Thanks Andreas, it work with include... but just for "simple" www adress
> (without ? and &).
>
> I solved the problem of ? with a "bidouille" (~= tricks) :
> -------- prepareinclude.xsl :
> * replace with a regex the ? by /post--parameter/
> * create <include
> src="cocoon://module/webagent/retrivepipe/www/without/http/post--parameter/parameters
>
I saw that you found another solution, however please see
http://commons.apache.org/lang/ and
http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html
There are all the escape methods one can wish.
salu2
--
Thorsten Scherler <thorsten.at.apache.org>
Open Source <consulting, training and solutions>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org
Re: [Feedmodule] How to declare an entity in a Java transformer?
Posted by Florent André <fl...@4sengines.com>.
Simpler is better !
After some broken keyboard, I see the cocoon htmltransformer.. and this
make me as "I saw an angel" ! :)
If you want to download and transform a large possibility of web pages (url
with ?,& ; page with frameset, or no </img> (!)), you can do that :
--- a sources.xsl :
<escaped-html>
<i:include parse="text" src="http://www.adress" />
</escaped-html>
--- in sitemap.xmap
* in :
<map:components>
<map:transformers default="xslt">
ADD :
<map:transformer
name="html"
logger="sitemap.transformer.html"
src="org.apache.cocoon.transformation.HTMLTransformer">
<!-- Tidy configuration file -->
<jtidy-config>fallback://lenya/modules/fckeditor/config/jtidy.properties</jtidy-config>
</map:transformer>
* in :
<map:pipelines>
<map:pipeline type="noncaching">
<map:match pattern="XXXXXX">
ADD
<map:generate src="test/sources.xml"/>
<map:transform type="include"/>
<map:transform type="html">
<map:parameter name="tags" value="escaped-html"/>
</map:transform>
And now... go to work for my boss ! :p)
Have a good WE
On Fri, 13 Mar 2009 11:32:12 +0100, Florent André
<fl...@4sengines.com> wrote:
> Hi Lenya's friend
>
> On Thu, 20 Nov 2008 22:10:05 +0100, Andreas Hartmann <an...@apache.org>
> wrote:
>> Hi André,
>>
>> Florent André schrieb:
>>> thanks for this pointer !
>>>
>>> HtmlGenerator works like a charm !
>>>
>>> But, I try to call this htmlgenerator in a xinclude... and it's don't
>>> work
>>> ! :(
>>
>> does it work with the IncludeTransformer?
>>
>>
>
http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/transformation/IncludeTransformer.html
>>
>> -- Andreas
>>
>
> Thanks Andreas, it work with include... but just for "simple" www adress
> (without ? and &).
>
> I solved the problem of ? with a "bidouille" (~= tricks) :
> -------- prepareinclude.xsl :
> * replace with a regex the ? by /post--parameter/
> * create <include
>
src="cocoon://module/webagent/retrivepipe/www/without/http/post--parameter/parameters
>
> --------- webagent's sitemap.xmap
> * <map:match="retrivepipe/**/post-parameter/**/">
> * <map:generate src="http://{1}/post-parameter/{2} type="html"/> //
call
> to HTMLGenerator
> * ...
> * </map:match>
>
>
> But I don't find any other solution for the & :
> - this character was translate into & in my xslt, and htmlgenerator
> don't do the & ==> & transformation...
>
> Do you have a suggestion ?
>
>
> Have a good day
>
>
>
>>>
>>> I try :
>>> <xi:include href="cocoon:/retrive/web/adress/without/http://"
>>> and
>>> <xi:include href="cocoon://retrive/web/adress/without/http://"
>>>
>>> But none of this work.
>>>
>>> The log4j says :
>>> * java.io.FileNotFoundException:
>>> * xIncluded resource not found: file:///
>>>
>>> The xinclude seem to search a file and not a pipeline...
>>>
>>> Thank you for any ideas.
>>>
>>> Notes :
>>> -- this Xinclude is build in an xsl call during the module's sitemap
>>>
>>> -- in the module's sitemap, I have one pipeline with this match, but
> it's
>>> don't call :
>>> <!-- patern = retrive/adress/web/without/http -->
>>> <map:match pattern="retrive/**">
>>> <map:generate src="http://{1}" type="html"/>
>>> <map:serialize type="xml"/>
>>> </map:match>
>>>
>>>
>>>
>>> On Thu, 20 Nov 2008 11:30:05 +0100, Andreas Hartmann
> <an...@apache.org>
>>> wrote:
>>>> Hi André,
>>>>
>>>> Florent André schrieb:
>>>>> I would like to parse localy downloaded (via <xi:include
> parse="text">)
>>>>> html pages.
>>>> I'm afraid this approach will only cause a lot of headache. I'd rather
>>>> recommend to use the HTMLGenerator [1] to parse the files. In your
>>>> XInclude statement you can just call the HTMLGenerator pipeline using
>>>> the cocoon:/ protocol.
>>>>
>>>> [1] http://cocoon.apache.org/2.1/userdocs/html-generator.html
>>>>
>>>> HTH,
>>>>
>>>> -- Andreas
>>>>
>>>>> After download, <xi:include> give me an "escape" html file.
>>>>>
>>>>> I suppress <!Doctype ... > with regex, but now the unescape
> transformer
>>>>> throw this error :
>>>>> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
>>>>> referenced, but not declared."
>>>>>
>>>>> I found this on internet : "To allow the use off   in you
>>>> stylesheet,
>>>>> you have to declare it first : <!DOCTYPE xsl:stylesheet [<!ENTITY
> nbsp
>>>>> " ">]> "
>>>>>
>>>>> How I can add this declaration in the java unescape transformer ?
>>>>>
>>>>> I think that I can remove all   with a regex, but I would like to
>>>> more
>>>>> understand how work java transformer.
>>>>>
>>>>> Thanks and have a good day.
>>>>>
>>>>> Florent
>>>>
>>>> --
>>>> Andreas Hartmann, CTO
>>>> BeCompany GmbH
>>>> http://www.becompany.ch
>>>> Tel.: +41 (0) 43 818 57 01
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
>>>> For additional commands, e-mail: user-help@lenya.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org
Re: [Feedmodule] How to declare an entity in a Java transformer?
Posted by Florent André <fl...@4sengines.com>.
Hi Lenya's friend
On Thu, 20 Nov 2008 22:10:05 +0100, Andreas Hartmann <an...@apache.org>
wrote:
> Hi André,
>
> Florent André schrieb:
>> thanks for this pointer !
>>
>> HtmlGenerator works like a charm !
>>
>> But, I try to call this htmlgenerator in a xinclude... and it's don't
>> work
>> ! :(
>
> does it work with the IncludeTransformer?
>
>
http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/transformation/IncludeTransformer.html
>
> -- Andreas
>
Thanks Andreas, it work with include... but just for "simple" www adress
(without ? and &).
I solved the problem of ? with a "bidouille" (~= tricks) :
-------- prepareinclude.xsl :
* replace with a regex the ? by /post--parameter/
* create <include
src="cocoon://module/webagent/retrivepipe/www/without/http/post--parameter/parameters
--------- webagent's sitemap.xmap
* <map:match="retrivepipe/**/post-parameter/**/">
* <map:generate src="http://{1}/post-parameter/{2} type="html"/> // call
to HTMLGenerator
* ...
* </map:match>
But I don't find any other solution for the & :
- this character was translate into & in my xslt, and htmlgenerator
don't do the & ==> & transformation...
Do you have a suggestion ?
Have a good day
>>
>> I try :
>> <xi:include href="cocoon:/retrive/web/adress/without/http://"
>> and
>> <xi:include href="cocoon://retrive/web/adress/without/http://"
>>
>> But none of this work.
>>
>> The log4j says :
>> * java.io.FileNotFoundException:
>> * xIncluded resource not found: file:///
>>
>> The xinclude seem to search a file and not a pipeline...
>>
>> Thank you for any ideas.
>>
>> Notes :
>> -- this Xinclude is build in an xsl call during the module's sitemap
>>
>> -- in the module's sitemap, I have one pipeline with this match, but
it's
>> don't call :
>> <!-- patern = retrive/adress/web/without/http -->
>> <map:match pattern="retrive/**">
>> <map:generate src="http://{1}" type="html"/>
>> <map:serialize type="xml"/>
>> </map:match>
>>
>>
>>
>> On Thu, 20 Nov 2008 11:30:05 +0100, Andreas Hartmann
<an...@apache.org>
>> wrote:
>>> Hi André,
>>>
>>> Florent André schrieb:
>>>> I would like to parse localy downloaded (via <xi:include
parse="text">)
>>>> html pages.
>>> I'm afraid this approach will only cause a lot of headache. I'd rather
>>> recommend to use the HTMLGenerator [1] to parse the files. In your
>>> XInclude statement you can just call the HTMLGenerator pipeline using
>>> the cocoon:/ protocol.
>>>
>>> [1] http://cocoon.apache.org/2.1/userdocs/html-generator.html
>>>
>>> HTH,
>>>
>>> -- Andreas
>>>
>>>> After download, <xi:include> give me an "escape" html file.
>>>>
>>>> I suppress <!Doctype ... > with regex, but now the unescape
transformer
>>>> throw this error :
>>>> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
>>>> referenced, but not declared."
>>>>
>>>> I found this on internet : "To allow the use off   in you
>>> stylesheet,
>>>> you have to declare it first : <!DOCTYPE xsl:stylesheet [<!ENTITY
nbsp
>>>> " ">]> "
>>>>
>>>> How I can add this declaration in the java unescape transformer ?
>>>>
>>>> I think that I can remove all   with a regex, but I would like to
>>> more
>>>> understand how work java transformer.
>>>>
>>>> Thanks and have a good day.
>>>>
>>>> Florent
>>>
>>> --
>>> Andreas Hartmann, CTO
>>> BeCompany GmbH
>>> http://www.becompany.ch
>>> Tel.: +41 (0) 43 818 57 01
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
>>> For additional commands, e-mail: user-help@lenya.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org
Re: [Feedmodule] How to declare an entity in a Java transformer
?
Posted by Andreas Hartmann <an...@apache.org>.
Hi André,
Florent André schrieb:
> thanks for this pointer !
>
> HtmlGenerator works like a charm !
>
> But, I try to call this htmlgenerator in a xinclude... and it's don't work
> ! :(
does it work with the IncludeTransformer?
http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/transformation/IncludeTransformer.html
-- Andreas
>
> I try :
> <xi:include href="cocoon:/retrive/web/adress/without/http://"
> and
> <xi:include href="cocoon://retrive/web/adress/without/http://"
>
> But none of this work.
>
> The log4j says :
> * java.io.FileNotFoundException:
> * xIncluded resource not found: file:///
>
> The xinclude seem to search a file and not a pipeline...
>
> Thank you for any ideas.
>
> Notes :
> -- this Xinclude is build in an xsl call during the module's sitemap
>
> -- in the module's sitemap, I have one pipeline with this match, but it's
> don't call :
> <!-- patern = retrive/adress/web/without/http -->
> <map:match pattern="retrive/**">
> <map:generate src="http://{1}" type="html"/>
> <map:serialize type="xml"/>
> </map:match>
>
>
>
> On Thu, 20 Nov 2008 11:30:05 +0100, Andreas Hartmann <an...@apache.org>
> wrote:
>> Hi André,
>>
>> Florent André schrieb:
>>> I would like to parse localy downloaded (via <xi:include parse="text">)
>>> html pages.
>> I'm afraid this approach will only cause a lot of headache. I'd rather
>> recommend to use the HTMLGenerator [1] to parse the files. In your
>> XInclude statement you can just call the HTMLGenerator pipeline using
>> the cocoon:/ protocol.
>>
>> [1] http://cocoon.apache.org/2.1/userdocs/html-generator.html
>>
>> HTH,
>>
>> -- Andreas
>>
>>> After download, <xi:include> give me an "escape" html file.
>>>
>>> I suppress <!Doctype ... > with regex, but now the unescape transformer
>>> throw this error :
>>> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
>>> referenced, but not declared."
>>>
>>> I found this on internet : "To allow the use off   in you
>> stylesheet,
>>> you have to declare it first : <!DOCTYPE xsl:stylesheet [<!ENTITY nbsp
>>> " ">]> "
>>>
>>> How I can add this declaration in the java unescape transformer ?
>>>
>>> I think that I can remove all   with a regex, but I would like to
>> more
>>> understand how work java transformer.
>>>
>>> Thanks and have a good day.
>>>
>>> Florent
>>
>> --
>> Andreas Hartmann, CTO
>> BeCompany GmbH
>> http://www.becompany.ch
>> Tel.: +41 (0) 43 818 57 01
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
>> For additional commands, e-mail: user-help@lenya.apache.org
--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org
Re: [Feedmodule] How to declare an entity in a Java transformer?
Posted by Florent André <fl...@4sengines.com>.
thanks for this pointer !
HtmlGenerator works like a charm !
But, I try to call this htmlgenerator in a xinclude... and it's don't work
! :(
I try :
<xi:include href="cocoon:/retrive/web/adress/without/http://"
and
<xi:include href="cocoon://retrive/web/adress/without/http://"
But none of this work.
The log4j says :
* java.io.FileNotFoundException:
* xIncluded resource not found: file:///
The xinclude seem to search a file and not a pipeline...
Thank you for any ideas.
Notes :
-- this Xinclude is build in an xsl call during the module's sitemap
-- in the module's sitemap, I have one pipeline with this match, but it's
don't call :
<!-- patern = retrive/adress/web/without/http -->
<map:match pattern="retrive/**">
<map:generate src="http://{1}" type="html"/>
<map:serialize type="xml"/>
</map:match>
On Thu, 20 Nov 2008 11:30:05 +0100, Andreas Hartmann <an...@apache.org>
wrote:
> Hi André,
>
> Florent André schrieb:
>> I would like to parse localy downloaded (via <xi:include parse="text">)
>> html pages.
>
> I'm afraid this approach will only cause a lot of headache. I'd rather
> recommend to use the HTMLGenerator [1] to parse the files. In your
> XInclude statement you can just call the HTMLGenerator pipeline using
> the cocoon:/ protocol.
>
> [1] http://cocoon.apache.org/2.1/userdocs/html-generator.html
>
> HTH,
>
> -- Andreas
>
>>
>> After download, <xi:include> give me an "escape" html file.
>>
>> I suppress <!Doctype ... > with regex, but now the unescape transformer
>> throw this error :
>> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
>> referenced, but not declared."
>>
>> I found this on internet : "To allow the use off   in you
> stylesheet,
>> you have to declare it first : <!DOCTYPE xsl:stylesheet [<!ENTITY nbsp
>> " ">]> "
>>
>> How I can add this declaration in the java unescape transformer ?
>>
>> I think that I can remove all   with a regex, but I would like to
> more
>> understand how work java transformer.
>>
>> Thanks and have a good day.
>>
>> Florent
>
>
> --
> Andreas Hartmann, CTO
> BeCompany GmbH
> http://www.becompany.ch
> Tel.: +41 (0) 43 818 57 01
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org