You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Reinhard Haller <re...@interactive-net.de> on 2007/11/21 10:39:42 UTC
2.1.10: charset & nekohtml
Hi,
I've a problem with umlaut's in nekohtml with the following url:
http://www.heise.de/security/news/meldung/99281/
The html-document doesn't contain any charset spec and neko has a
charset problem (the charset of the http response is utf-8).
My sitemap snippet:
<map:generator label="content" logger="sitemap.generator.html"
name="nekohtml" src="org.apache.cocoon.generation.NekoHTMLGenerator"/>
<map:match pattern="**/*.neko">
<map:generate type="nekohtml"
src="{request-param:serv}" label="debug1x" />
<map:serialize type="xml"/>
</map:match>
Any suggestions, parameters to set?
Thanks
Reinhard
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: 2.1.10: charset & nekohtml
Posted by Bertrand Delacretaz <bd...@apache.org>.
On Nov 22, 2007 11:51 AM, Reinhard Haller
<re...@interactive-net.de> wrote:
> Bertrand Delacretaz schrieb:
> ...<map:transform type="nekohtml">
> <map:parameter name="input-encoding" value="iso-8859-1" />
> </map:transform>...
>
> ... I'm not convinced, the parameter changes anything as you can see in the
> following sitemap (I tried also iso-8859-1 and utf-8)....
Right, sorry - I double-checked, and this was using a slightly
customized version of the NekoHTMLTransformer, where we have added
this parameter.
Basically, you want this line in NekoHTMLTransformer:
ByteArrayInputStream bais =
new ByteArrayInputStream(text.getBytes());
to use a specific encoding, like
ByteArrayInputStream bais = new
ByteArrayInputStream(text.getBytes(inputEncoding));
and you can make this configurable by reading the parameter in the
setup() method:
inputEncoding = par.getParameter("input-encoding",DEFAULT_INPUT_ENCODING);
after declaring these class members:
/** Encoding to use to convert input text for reading by Neko */
final static String DEFAULT_INPUT_ENCODING = "iso-8859-1";
private String inputEncoding = DEFAULT_INPUT_ENCODING;
I don't have time to prepare a patch ATM, but if you want to it that
should be simple enough.
-Bertrand
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: 2.1.10: charset & nekohtml
Posted by Reinhard Haller <re...@interactive-net.de>.
Joerg Heinicke schrieb:
> On 22.11.2007 5:51 Uhr, Reinhard Haller wrote:
>
>>>> ...The html-document doesn't contain any charset spec and neko has a
>>>> charset problem (the charset of the http response is utf-8)....
>>>>
>>>
>>> I've had to use the input-encoding parameter for neko to work
>>> correctly, for example:
>>>
>>> <map:transform type="nekohtml">
>>> <map:parameter name="input-encoding" value="iso-8859-1" />
>>> </map:transform>
>>>
>>>
>> I'm not convinced, the parameter changes anything as you can see in
>> the following sitemap (I tried also iso-8859-1 and utf-8).
>>
>> <map:match pattern="**/*.neko">
>> <map:generate type="nekohtml"
>> src="{request-param:serv}" label="debug1x" >
>> <parameter name="input-encoding" value="1"/>
>
> Is it something as trivial as the correct namespace prefix:
> map:parameter??
>
no, this was only a typo.
Reinhard
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: 2.1.10: charset & nekohtml
Posted by Joerg Heinicke <jo...@gmx.de>.
On 22.11.2007 5:51 Uhr, Reinhard Haller wrote:
>>> ...The html-document doesn't contain any charset spec and neko has a
>>> charset problem (the charset of the http response is utf-8)....
>>>
>>
>> I've had to use the input-encoding parameter for neko to work
>> correctly, for example:
>>
>> <map:transform type="nekohtml">
>> <map:parameter name="input-encoding" value="iso-8859-1" />
>> </map:transform>
>>
>>
> I'm not convinced, the parameter changes anything as you can see in the
> following sitemap (I tried also iso-8859-1 and utf-8).
>
> <map:match pattern="**/*.neko">
> <map:generate type="nekohtml" src="{request-param:serv}"
> label="debug1x" >
> <parameter name="input-encoding" value="1"/>
Is it something as trivial as the correct namespace prefix: map:parameter??
Joerg
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: 2.1.10: charset & nekohtml
Posted by Reinhard Haller <re...@interactive-net.de>.
Hi Bertrand,
Bertrand Delacretaz schrieb:
> On Nov 21, 2007 10:39 AM, Reinhard Haller
> <re...@interactive-net.de> wrote:
>
>
>> ...The html-document doesn't contain any charset spec and neko has a
>> charset problem (the charset of the http response is utf-8)....
>>
>
> I've had to use the input-encoding parameter for neko to work
> correctly, for example:
>
> <map:transform type="nekohtml">
> <map:parameter name="input-encoding" value="iso-8859-1" />
> </map:transform>
>
>
I'm not convinced, the parameter changes anything as you can see in the
following sitemap (I tried also iso-8859-1 and utf-8).
<map:match pattern="**/*.neko">
<map:generate type="nekohtml"
src="{request-param:serv}" label="debug1x" >
<parameter name="input-encoding" value="1"/>
</map:generate>
<map:serialize type="xml"/>
</map:match>
Greetings
Reinhard
Re: 2.1.10: charset & nekohtml
Posted by Bertrand Delacretaz <bd...@apache.org>.
On Nov 21, 2007 10:39 AM, Reinhard Haller
<re...@interactive-net.de> wrote:
> ...The html-document doesn't contain any charset spec and neko has a
> charset problem (the charset of the http response is utf-8)....
I've had to use the input-encoding parameter for neko to work
correctly, for example:
<map:transform type="nekohtml">
<map:parameter name="input-encoding" value="iso-8859-1" />
</map:transform>
Haven't investigated exactly what's happening there.
-Bertrand
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: 2.1.10: charset & nekohtml
Posted by Reinhard Haller <re...@interactive-net.de>.
Hi, Ignacio,
listas@carmenynacho.com schrieb:
>> From: Reinhard Haller [mailto:reinhard.haller@interactive-net.de]
>> Sent: Wednesday, November 21, 2007 10:40 AM
>>
>
> AFAIK to config neko you will need to pass a properties file: ( exceprt from
> default cocoon.xconf )
>
> <map:generator label="content" logger="sitemap.generator.html"
> name="nekohtml" pool-max="${nekohtml-generator.pool-max}"
> src="org.apache.cocoon.generation.NekoHTMLGenerator">
> <neko-config>context://WEB-INF/neko.properties</neko-config>
> </map:generator>
>
>
> You can then tweak the properties file pointed by neko-config.. There is a
> neko.properties in the default install..Buried inside, it is:
>
> http\://cyberneko.org/html/properties/default-encoding=Windows-1252
>
I knew the neko-html works in a similar way as the old Tidy HTML-generator.
I'm nto sure the setting of the default encoding really solves my
problem. If you analyze the http-response to
http://www.heise.de/security/news/meldung/99281/
you can see the charset is defined as utf-8. So I changed the neko
default-encoding property to
http\://cyberneko.org/html/properties/default-encoding=utf-8
The resulting neko output has the same errors regarding umlauts as all
my other tryouts.
Any suggestions?
Thanks
Reinhard
RE: 2.1.10: charset & nekohtml
Posted by li...@carmenynacho.com.
> From: Reinhard Haller [mailto:reinhard.haller@interactive-net.de]
> Sent: Wednesday, November 21, 2007 10:40 AM
AFAIK to config neko you will need to pass a properties file: ( exceprt from
default cocoon.xconf )
<map:generator label="content" logger="sitemap.generator.html"
name="nekohtml" pool-max="${nekohtml-generator.pool-max}"
src="org.apache.cocoon.generation.NekoHTMLGenerator">
<neko-config>context://WEB-INF/neko.properties</neko-config>
</map:generator>
You can then tweak the properties file pointed by neko-config.. There is a
neko.properties in the default install..Buried inside, it is:
http\://cyberneko.org/html/properties/default-encoding=Windows-1252
HTH
Experience is the mother of science ;)
Saludos,
Ignacio J. Ortega
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org