You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by leon tian <ti...@yahoo.co.uk> on 2004/05/17 07:59:04 UTC

the � problem.

hi,
 
there are always "Â" on the my transformed pages because of "&nbsp;". how can i get rid of them? i use XHTML serializer. any help will be most appreciated;)
 
 
Leon


		
---------------------------------
  Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now

Re: the Â_problem.

Posted by Joerg Heinicke <jo...@gmx.de>.
On 18.05.2004 00:57, Upayavira wrote:

>> And configuring the parser instead of jtidy is not possible?
> 
> Sorry? Tidy takes in an input stream and gives out a DOM which is passed 
> to a DOMStreamer. What parser should be configured?

Ah, of course. I'm tired :) I thought of some jtidy output that must be 
parsed. But with a DOM that's obviously not the case.

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: the Â_problem.

Posted by Upayavira <uv...@upaya.co.uk>.
Joerg Heinicke wrote:

> On 18.05.2004 00:25, Upayavira wrote:
>
>>>> I use html generator without configuration and xhtml serializer
>>>> encoding to UTF-8. Could you tell me where the problem may be?
>>>
>>>
>>> The remote web page has a specific encoding. I guess the HTML 
>>> generator is ignoring it and parses the remote webpage probably 
>>> using UTF-8. I don't know about the details or how to solve it. 
>>> Maybe you can get jtidy to output XML in a specific encoding that 
>>> the parser parsing the jtidy output expects.
>>
>>
>>
>> I've recently tried to change the encoding on JTidy. It doesn't seem 
>> to work. I followed it right in in a debugger - the configured locale 
>> was set right inside JTidy, but it still outputted ISO-8859-1. No UTF-8.
>>
>> I'm thinking of extending the HTML generator to use something like 
>> NekoHTML (I'm using it right now for a work project, and I reckon 
>> it'd be pretty easy to do (like 10 lines of code). So the generator 
>> would be configurable as to which tool it uses.
>
>
> And configuring the parser instead of jtidy is not possible?

Sorry? Tidy takes in an input stream and gives out a DOM which is passed 
to a DOMStreamer. What parser should be configured?

Upayavira



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: the �_problem.

Posted by leon tian <ti...@yahoo.co.uk>.
Hi, I wanna configurate jtidy. How can I set jtidy.properties's url using resource: in sitemap?

Joerg Heinicke <jo...@gmx.de> wrote:On 18.05.2004 00:25, Upayavira wrote:

>>> I use html generator without configuration and xhtml serializer
>>> encoding to UTF-8. Could you tell me where the problem may be?
>>
>> The remote web page has a specific encoding. I guess the HTML 
>> generator is ignoring it and parses the remote webpage probably using 
>> UTF-8. I don't know about the details or how to solve it. Maybe you 
>> can get jtidy to output XML in a specific encoding that the parser 
>> parsing the jtidy output expects.
> 
> 
> I've recently tried to change the encoding on JTidy. It doesn't seem to 
> work. I followed it right in in a debugger - the configured locale was 
> set right inside JTidy, but it still outputted ISO-8859-1. No UTF-8.
> 
> I'm thinking of extending the HTML generator to use something like 
> NekoHTML (I'm using it right now for a work project, and I reckon it'd 
> be pretty easy to do (like 10 lines of code). So the generator would be 
> configurable as to which tool it uses.

And configuring the parser instead of jtidy is not possible?

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


		
---------------------------------
  Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now

Re: the Â_problem.

Posted by Joerg Heinicke <jo...@gmx.de>.
On 18.05.2004 00:25, Upayavira wrote:

>>> I use html generator without configuration and xhtml serializer
>>> encoding to UTF-8. Could you tell me where the problem may be?
>>
>> The remote web page has a specific encoding. I guess the HTML 
>> generator is ignoring it and parses the remote webpage probably using 
>> UTF-8. I don't know about the details or how to solve it. Maybe you 
>> can get jtidy to output XML in a specific encoding that the parser 
>> parsing the jtidy output expects.
> 
> 
> I've recently tried to change the encoding on JTidy. It doesn't seem to 
> work. I followed it right in in a debugger - the configured locale was 
> set right inside JTidy, but it still outputted ISO-8859-1. No UTF-8.
> 
> I'm thinking of extending the HTML generator to use something like 
> NekoHTML (I'm using it right now for a work project, and I reckon it'd 
> be pretty easy to do (like 10 lines of code). So the generator would be 
> configurable as to which tool it uses.

And configuring the parser instead of jtidy is not possible?

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: the �_problem.

Posted by leon tian <ti...@yahoo.co.uk>.
Thanks. I'll try it;)


Upayavira <uv...@upaya.co.uk> wrote:
Joerg Heinicke wrote:

> On 18.05.2004 00:01, leon tian wrote:
>
>> I use html generator without configuration and xhtml serializer
>> encoding to UTF-8. Could you tell me where the problem may be?
>
>
> The remote web page has a specific encoding. I guess the HTML 
> generator is ignoring it and parses the remote webpage probably using 
> UTF-8. I don't know about the details or how to solve it. Maybe you 
> can get jtidy to output XML in a specific encoding that the parser 
> parsing the jtidy output expects.

I've recently tried to change the encoding on JTidy. It doesn't seem to 
work. I followed it right in in a debugger - the configured locale was 
set right inside JTidy, but it still outputted ISO-8859-1. No UTF-8.

I'm thinking of extending the HTML generator to use something like 
NekoHTML (I'm using it right now for a work project, and I reckon it'd 
be pretty easy to do (like 10 lines of code). So the generator would be 
configurable as to which tool it uses.

Upayavira



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org

		
---------------------------------
  Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now

Re: the Â_problem.

Posted by Upayavira <uv...@upaya.co.uk>.
Joerg Heinicke wrote:

> On 18.05.2004 00:01, leon tian wrote:
>
>> I use html generator without configuration and xhtml serializer
>> encoding to UTF-8. Could you tell me where the problem may be?
>
>
> The remote web page has a specific encoding. I guess the HTML 
> generator is ignoring it and parses the remote webpage probably using 
> UTF-8. I don't know about the details or how to solve it. Maybe you 
> can get jtidy to output XML in a specific encoding that the parser 
> parsing the jtidy output expects.

I've recently tried to change the encoding on JTidy. It doesn't seem to 
work. I followed it right in in a debugger - the configured locale was 
set right inside JTidy, but it still outputted ISO-8859-1. No UTF-8.

I'm thinking of extending the HTML generator to use something like 
NekoHTML (I'm using it right now for a work project, and I reckon it'd 
be pretty easy to do (like 10 lines of code). So the generator would be 
configurable as to which tool it uses.

Upayavira



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: the Â_problem.

Posted by Joerg Heinicke <jo...@gmx.de>.
On 18.05.2004 00:01, leon tian wrote:

> I use html generator without configuration and xhtml serializer
> encoding to UTF-8. Could you tell me where the problem may be?

The remote web page has a specific encoding. I guess the HTML generator 
is ignoring it and parses the remote webpage probably using UTF-8. I 
don't know about the details or how to solve it. Maybe you can get jtidy 
to output XML in a specific encoding that the parser parsing the jtidy 
output expects.

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: the �_problem.

Posted by leon tian <ti...@yahoo.co.uk>.
I use html generator without configuration and xhtml serializer encoding to UTF-8. Could you tell me where the problem may be? Thanks.
 

Joerg Heinicke <jo...@gmx.de> wrote:
On 17.05.2004 14:59, leon tian wrote:

> hi,
> 
> thanks for your advice. but i wanna transform the web pages from the
> internet and then display them. there are already " " between
> tags in the web pages. how can i change them to " "? should i
> configurate JTidy or XHTML serializer, or any other way?


IMO that's not related to any entity but to encoding directly. I guess 
your parsing the files in the wrong encoding.

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org

		
---------------------------------
  Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now

Re: the Â_problem.

Posted by Joerg Heinicke <jo...@gmx.de>.
On 17.05.2004 14:59, leon tian wrote:

> hi,
> 
> thanks for your advice. but i wanna transform the web pages from the
> internet and then display them. there are already "&nbsp" between
> tags in the web pages. how can i change them to " "? should i
> configurate JTidy or XHTML serializer, or any other way?


IMO that's not related to any entity but to encoding directly. I guess 
your parsing the files in the wrong encoding.

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: the �_problem.

Posted by leon tian <ti...@yahoo.co.uk>.
hi, 
 
thanks for your advice. but i wanna transform the web pages from the internet and then display them. there are already "&nbsp" between tags in the web pages. how can i change them to " "? should i configurate JTidy or XHTML serializer, or any other way?
 
best regards,
 
 
Leon

"John L. Webber" <Jo...@jentro.com> wrote:
Hi Leon,

Try using " "

John

leon tian wrote:

> hi,
> 
> there are always "Â" on the my transformed pages because of 
> " ". how can i get rid of them? i use XHTML serializer. any help 
> will be most appreciated;)
> 
> 
> Leon
>
> ------------------------------------------------------------------------
> * Yahoo! Messenger* 
> 
> - Communicate instantly..."Ping" your friends today! *Download 
> Messenger Now* 
> 



-- 
---------------------------------------------------------
Jentro Technologies GmbH
John L. Webber, Software Development
---------------------------------------------------------
Rosenheimer Str. 145e 81671 München
Tel. +49 89 189 169 80 mailto:John.Webber@jentro.com 
Fax +49 89 189 169 99 http://www.jentro.com
---------------------------------------------------------
NOTICE: The information contained in this e-mail is confidential or may otherwise be legally privileged. It is intended for the named recipient only. If you have received it in error, please notify us immediately by reply or by calling the telephone number above and delete this message and all its attachments without any use or further distribution of its contents. Please note that any unauthorised review, copying, disclosing or otherwise making use of the information is strictly prohibited. Thank you. 
---------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org

		
---------------------------------
  Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now

Re: the  problem.

Posted by "John L. Webber" <Jo...@jentro.com>.
Hi Leon,

Try using "&#160;"

John

leon tian wrote:

> hi,
>  
> there are always "Â" on the my transformed pages because of 
> "&nbsp;". how can i get rid of them? i use XHTML serializer. any help 
> will be most appreciated;)
>  
>  
> Leon
>
> ------------------------------------------------------------------------
> * Yahoo! Messenger* 
> <http://uk.rd.yahoo.com/mail/tagline_messenger/*http://uk.messenger.yahoo.com> 
> - Communicate instantly..."Ping" your friends today! *Download 
> Messenger Now* 
> <http://uk.rd.yahoo.com/mail/tagline_messenger/*http://uk.messenger.yahoo.com/download/index.html> 



-- 
---------------------------------------------------------
 Jentro Technologies GmbH
 John L. Webber, Software Development
---------------------------------------------------------
 Rosenheimer Str. 145e	    81671 München
 Tel. +49 89 189 169 80     mailto:John.Webber@jentro.com 
 Fax  +49 89 189 169 99     http://www.jentro.com
---------------------------------------------------------
NOTICE: The information contained in this e-mail is confidential or may otherwise be legally privileged. It is intended for the named recipient only. If you have received it in error, please notify us immediately by reply or by calling the telephone number above and delete this message and all its attachments without any use or further distribution of its contents. Please note that any unauthorised review, copying, disclosing or otherwise making use of the information is strictly prohibited. Thank you. 
---------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org