You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lenya.apache.org by Andreas Hartmann <an...@apache.org> on 2008/11/20 11:30:05 UTC

Re: [Feedmodule] How to declare an entity in a Java transformer ?

Hi André,

Florent André schrieb:
> I would like to parse localy downloaded (via <xi:include parse="text">)
> html pages. 

I'm afraid this approach will only cause a lot of headache. I'd rather 
recommend to use the HTMLGenerator [1] to parse the files. In your 
XInclude statement you can just call the HTMLGenerator pipeline using 
the cocoon:/ protocol.

[1] http://cocoon.apache.org/2.1/userdocs/html-generator.html

HTH,

-- Andreas

> 
> After download, <xi:include> give me an "escape" html file.
> 
> I suppress <!Doctype ... > with regex, but now the unescape transformer
> throw this error : 
> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
> referenced, but not declared."
>  
> I found this on internet : "To allow the use off &nbsp in you stylesheet,
> you have to declare it first :  <!DOCTYPE xsl:stylesheet [<!ENTITY nbsp
> "&#160;">]> "
> 
> How I can add this declaration in the java unescape transformer ?
> 
> I think that I can remove all &nbsp with a regex, but I would like to more
> understand how work java transformer.
> 
> Thanks and have a good day.
> 
> Florent


-- 
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Escape string for xml (was Re: [Feedmodule] How to declare an entity in a Java transformer ?)

Posted by Thorsten Scherler <th...@apache.org>.
On Fri, 2009-03-13 at 11:32 +0100, Florent André wrote:
...
> Thanks Andreas, it work with include... but just for "simple" www adress
> (without ? and &).
> 
> I solved the problem of ? with a "bidouille" (~= tricks) :
> -------- prepareinclude.xsl : 
> * replace with a regex the ? by /post--parameter/
> * create <include
> src="cocoon://module/webagent/retrivepipe/www/without/http/post--parameter/parameters
> 

I saw that you found another solution, however please see
http://commons.apache.org/lang/ and 
http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html

There are all the escape methods one can wish.

salu2
-- 
Thorsten Scherler <thorsten.at.apache.org>
Open Source <consulting, training and solutions>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: [Feedmodule] How to declare an entity in a Java transformer?

Posted by Florent André <fl...@4sengines.com>.
Simpler is better !

After some broken keyboard, I see the cocoon htmltransformer.. and this
make me as "I saw an angel" ! :)

If you want to download and transform a large possibility of web pages (url
with ?,& ; page with frameset, or no </img> (!)), you can do that : 

--- a sources.xsl :
<escaped-html>
<i:include parse="text" src="http://www.adress" />
</escaped-html>

--- in sitemap.xmap

* in :
 <map:components>
    <map:transformers default="xslt">

ADD :
    <map:transformer
      name="html"
      logger="sitemap.transformer.html"
      src="org.apache.cocoon.transformation.HTMLTransformer">
      <!-- Tidy configuration file -->
     
<jtidy-config>fallback://lenya/modules/fckeditor/config/jtidy.properties</jtidy-config>
    </map:transformer>

* in : 
 <map:pipelines>

    <map:pipeline type="noncaching">

      <map:match pattern="XXXXXX">

ADD
   
       <map:generate src="test/sources.xml"/>

       <map:transform type="include"/>

       <map:transform type="html">
           <map:parameter name="tags" value="escaped-html"/>
       </map:transform>




And now... go to work for my boss ! :p)

Have a good WE

On Fri, 13 Mar 2009 11:32:12 +0100, Florent André
<fl...@4sengines.com> wrote:
> Hi Lenya's friend
> 
> On Thu, 20 Nov 2008 22:10:05 +0100, Andreas Hartmann <an...@apache.org>
> wrote:
>> Hi André,
>> 
>> Florent André schrieb:
>>> thanks for this pointer !
>>> 
>>> HtmlGenerator works like a charm !
>>> 
>>> But, I try to call this htmlgenerator in a xinclude... and it's don't
>>> work
>>> ! :(
>> 
>> does it work with the IncludeTransformer?
>> 
>>
>
http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/transformation/IncludeTransformer.html
>> 
>> -- Andreas
>> 
> 
> Thanks Andreas, it work with include... but just for "simple" www adress
> (without ? and &).
> 
> I solved the problem of ? with a "bidouille" (~= tricks) :
> -------- prepareinclude.xsl : 
> * replace with a regex the ? by /post--parameter/
> * create <include
>
src="cocoon://module/webagent/retrivepipe/www/without/http/post--parameter/parameters
> 
> --------- webagent's sitemap.xmap
> * <map:match="retrivepipe/**/post-parameter/**/">
> *    <map:generate src="http://{1}/post-parameter/{2} type="html"/> //
call
> to HTMLGenerator
> * ...
> * </map:match>
> 
> 
> But I don't find any other solution for the & : 
> - this character was translate into & in my xslt, and htmlgenerator
> don't do the & ==> & transformation...
> 
> Do you have a suggestion ? 
> 
> 
> Have a good day
> 
> 
> 
>>> 
>>> I try : 
>>> <xi:include href="cocoon:/retrive/web/adress/without/http://"
>>> and 
>>> <xi:include href="cocoon://retrive/web/adress/without/http://"
>>> 
>>> But none of this work.
>>> 
>>> The log4j says : 
>>> * java.io.FileNotFoundException: 
>>> * xIncluded resource not found: file:///
>>> 
>>> The xinclude seem to search a file and not a pipeline... 
>>> 
>>> Thank you for any ideas.
>>> 
>>> Notes : 
>>> -- this Xinclude is build in an xsl call during the module's sitemap
>>> 
>>> -- in the module's sitemap, I have one pipeline with this match, but
> it's
>>> don't call  : 
>>> <!-- patern = retrive/adress/web/without/http -->
>>>         <map:match pattern="retrive/**">
>>>                 <map:generate src="http://{1}" type="html"/>
>>>                 <map:serialize type="xml"/>
>>>         </map:match>
>>> 
>>> 
>>> 
>>> On Thu, 20 Nov 2008 11:30:05 +0100, Andreas Hartmann
> <an...@apache.org>
>>> wrote:
>>>> Hi André,
>>>>
>>>> Florent André schrieb:
>>>>> I would like to parse localy downloaded (via <xi:include
> parse="text">)
>>>>> html pages.
>>>> I'm afraid this approach will only cause a lot of headache. I'd rather
>>>> recommend to use the HTMLGenerator [1] to parse the files. In your
>>>> XInclude statement you can just call the HTMLGenerator pipeline using
>>>> the cocoon:/ protocol.
>>>>
>>>> [1] http://cocoon.apache.org/2.1/userdocs/html-generator.html
>>>>
>>>> HTH,
>>>>
>>>> -- Andreas
>>>>
>>>>> After download, <xi:include> give me an "escape" html file.
>>>>>
>>>>> I suppress <!Doctype ... > with regex, but now the unescape
> transformer
>>>>> throw this error :
>>>>> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
>>>>> referenced, but not declared."
>>>>>
>>>>> I found this on internet : "To allow the use off &nbsp in you
>>>> stylesheet,
>>>>> you have to declare it first :  <!DOCTYPE xsl:stylesheet [<!ENTITY
> nbsp
>>>>> " ">]> "
>>>>>
>>>>> How I can add this declaration in the java unescape transformer ?
>>>>>
>>>>> I think that I can remove all &nbsp with a regex, but I would like to
>>>> more
>>>>> understand how work java transformer.
>>>>>
>>>>> Thanks and have a good day.
>>>>>
>>>>> Florent
>>>>
>>>> --
>>>> Andreas Hartmann, CTO
>>>> BeCompany GmbH
>>>> http://www.becompany.ch
>>>> Tel.: +41 (0) 43 818 57 01
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
>>>> For additional commands, e-mail: user-help@lenya.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: [Feedmodule] How to declare an entity in a Java transformer?

Posted by Florent André <fl...@4sengines.com>.
Hi Lenya's friend

On Thu, 20 Nov 2008 22:10:05 +0100, Andreas Hartmann <an...@apache.org>
wrote:
> Hi André,
> 
> Florent André schrieb:
>> thanks for this pointer !
>> 
>> HtmlGenerator works like a charm !
>> 
>> But, I try to call this htmlgenerator in a xinclude... and it's don't
>> work
>> ! :(
> 
> does it work with the IncludeTransformer?
> 
>
http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/transformation/IncludeTransformer.html
> 
> -- Andreas
> 

Thanks Andreas, it work with include... but just for "simple" www adress
(without ? and &).

I solved the problem of ? with a "bidouille" (~= tricks) :
-------- prepareinclude.xsl : 
* replace with a regex the ? by /post--parameter/
* create <include
src="cocoon://module/webagent/retrivepipe/www/without/http/post--parameter/parameters

--------- webagent's sitemap.xmap
* <map:match="retrivepipe/**/post-parameter/**/">
*    <map:generate src="http://{1}/post-parameter/{2} type="html"/> // call
to HTMLGenerator
* ...
* </map:match>


But I don't find any other solution for the & : 
- this character was translate into &amp; in my xslt, and htmlgenerator
don't do the &amp; ==> & transformation...

Do you have a suggestion ? 


Have a good day



>> 
>> I try : 
>> <xi:include href="cocoon:/retrive/web/adress/without/http://"
>> and 
>> <xi:include href="cocoon://retrive/web/adress/without/http://"
>> 
>> But none of this work.
>> 
>> The log4j says : 
>> * java.io.FileNotFoundException: 
>> * xIncluded resource not found: file:///
>> 
>> The xinclude seem to search a file and not a pipeline... 
>> 
>> Thank you for any ideas.
>> 
>> Notes : 
>> -- this Xinclude is build in an xsl call during the module's sitemap
>> 
>> -- in the module's sitemap, I have one pipeline with this match, but
it's
>> don't call  : 
>> <!-- patern = retrive/adress/web/without/http -->
>>         <map:match pattern="retrive/**">
>>                 <map:generate src="http://{1}" type="html"/>
>>                 <map:serialize type="xml"/>
>>         </map:match>
>> 
>> 
>> 
>> On Thu, 20 Nov 2008 11:30:05 +0100, Andreas Hartmann
<an...@apache.org>
>> wrote:
>>> Hi André,
>>>
>>> Florent André schrieb:
>>>> I would like to parse localy downloaded (via <xi:include
parse="text">)
>>>> html pages.
>>> I'm afraid this approach will only cause a lot of headache. I'd rather
>>> recommend to use the HTMLGenerator [1] to parse the files. In your
>>> XInclude statement you can just call the HTMLGenerator pipeline using
>>> the cocoon:/ protocol.
>>>
>>> [1] http://cocoon.apache.org/2.1/userdocs/html-generator.html
>>>
>>> HTH,
>>>
>>> -- Andreas
>>>
>>>> After download, <xi:include> give me an "escape" html file.
>>>>
>>>> I suppress <!Doctype ... > with regex, but now the unescape
transformer
>>>> throw this error :
>>>> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
>>>> referenced, but not declared."
>>>>
>>>> I found this on internet : "To allow the use off &nbsp in you
>>> stylesheet,
>>>> you have to declare it first :  <!DOCTYPE xsl:stylesheet [<!ENTITY
nbsp
>>>> " ">]> "
>>>>
>>>> How I can add this declaration in the java unescape transformer ?
>>>>
>>>> I think that I can remove all &nbsp with a regex, but I would like to
>>> more
>>>> understand how work java transformer.
>>>>
>>>> Thanks and have a good day.
>>>>
>>>> Florent
>>>
>>> --
>>> Andreas Hartmann, CTO
>>> BeCompany GmbH
>>> http://www.becompany.ch
>>> Tel.: +41 (0) 43 818 57 01
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
>>> For additional commands, e-mail: user-help@lenya.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: [Feedmodule] How to declare an entity in a Java transformer ?

Posted by Andreas Hartmann <an...@apache.org>.
Hi André,

Florent André schrieb:
> thanks for this pointer !
> 
> HtmlGenerator works like a charm !
> 
> But, I try to call this htmlgenerator in a xinclude... and it's don't work
> ! :(

does it work with the IncludeTransformer?

http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/transformation/IncludeTransformer.html

-- Andreas

> 
> I try : 
> <xi:include href="cocoon:/retrive/web/adress/without/http://"
> and 
> <xi:include href="cocoon://retrive/web/adress/without/http://"
> 
> But none of this work.
> 
> The log4j says : 
> * java.io.FileNotFoundException: 
> * xIncluded resource not found: file:///
> 
> The xinclude seem to search a file and not a pipeline... 
> 
> Thank you for any ideas.
> 
> Notes : 
> -- this Xinclude is build in an xsl call during the module's sitemap
> 
> -- in the module's sitemap, I have one pipeline with this match, but it's
> don't call  : 
> <!-- patern = retrive/adress/web/without/http -->
>         <map:match pattern="retrive/**">
>                 <map:generate src="http://{1}" type="html"/>
>                 <map:serialize type="xml"/>
>         </map:match>
> 
> 
> 
> On Thu, 20 Nov 2008 11:30:05 +0100, Andreas Hartmann <an...@apache.org>
> wrote:
>> Hi André,
>>
>> Florent André schrieb:
>>> I would like to parse localy downloaded (via <xi:include parse="text">)
>>> html pages.
>> I'm afraid this approach will only cause a lot of headache. I'd rather
>> recommend to use the HTMLGenerator [1] to parse the files. In your
>> XInclude statement you can just call the HTMLGenerator pipeline using
>> the cocoon:/ protocol.
>>
>> [1] http://cocoon.apache.org/2.1/userdocs/html-generator.html
>>
>> HTH,
>>
>> -- Andreas
>>
>>> After download, <xi:include> give me an "escape" html file.
>>>
>>> I suppress <!Doctype ... > with regex, but now the unescape transformer
>>> throw this error :
>>> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
>>> referenced, but not declared."
>>>
>>> I found this on internet : "To allow the use off &nbsp in you
>> stylesheet,
>>> you have to declare it first :  <!DOCTYPE xsl:stylesheet [<!ENTITY nbsp
>>> " ">]> "
>>>
>>> How I can add this declaration in the java unescape transformer ?
>>>
>>> I think that I can remove all &nbsp with a regex, but I would like to
>> more
>>> understand how work java transformer.
>>>
>>> Thanks and have a good day.
>>>
>>> Florent
>>
>> --
>> Andreas Hartmann, CTO
>> BeCompany GmbH
>> http://www.becompany.ch
>> Tel.: +41 (0) 43 818 57 01
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
>> For additional commands, e-mail: user-help@lenya.apache.org


-- 
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: [Feedmodule] How to declare an entity in a Java transformer?

Posted by Florent André <fl...@4sengines.com>.
thanks for this pointer !

HtmlGenerator works like a charm !

But, I try to call this htmlgenerator in a xinclude... and it's don't work
! :(

I try : 
<xi:include href="cocoon:/retrive/web/adress/without/http://"
and 
<xi:include href="cocoon://retrive/web/adress/without/http://"

But none of this work.

The log4j says : 
* java.io.FileNotFoundException: 
* xIncluded resource not found: file:///

The xinclude seem to search a file and not a pipeline... 

Thank you for any ideas.

Notes : 
-- this Xinclude is build in an xsl call during the module's sitemap

-- in the module's sitemap, I have one pipeline with this match, but it's
don't call  : 
<!-- patern = retrive/adress/web/without/http -->
        <map:match pattern="retrive/**">
                <map:generate src="http://{1}" type="html"/>
                <map:serialize type="xml"/>
        </map:match>



On Thu, 20 Nov 2008 11:30:05 +0100, Andreas Hartmann <an...@apache.org>
wrote:
> Hi André,
> 
> Florent André schrieb:
>> I would like to parse localy downloaded (via <xi:include parse="text">)
>> html pages.
> 
> I'm afraid this approach will only cause a lot of headache. I'd rather
> recommend to use the HTMLGenerator [1] to parse the files. In your
> XInclude statement you can just call the HTMLGenerator pipeline using
> the cocoon:/ protocol.
> 
> [1] http://cocoon.apache.org/2.1/userdocs/html-generator.html
> 
> HTH,
> 
> -- Andreas
> 
>>
>> After download, <xi:include> give me an "escape" html file.
>>
>> I suppress <!Doctype ... > with regex, but now the unescape transformer
>> throw this error :
>> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
>> referenced, but not declared."
>>
>> I found this on internet : "To allow the use off &nbsp in you
> stylesheet,
>> you have to declare it first :  <!DOCTYPE xsl:stylesheet [<!ENTITY nbsp
>> " ">]> "
>>
>> How I can add this declaration in the java unescape transformer ?
>>
>> I think that I can remove all &nbsp with a regex, but I would like to
> more
>> understand how work java transformer.
>>
>> Thanks and have a good day.
>>
>> Florent
> 
> 
> --
> Andreas Hartmann, CTO
> BeCompany GmbH
> http://www.becompany.ch
> Tel.: +41 (0) 43 818 57 01
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org