You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by "Børre Gaup (JIRA)" <ji...@apache.org> on 2005/09/08 09:22:30 UTC

[jira] Created: (FOR-668) UTF-8 encoded .ihtml documents gives garbled output

UTF-8 encoded .ihtml documents gives garbled output
---------------------------------------------------

         Key: FOR-668
         URL: http://issues.apache.org/jira/browse/FOR-668
     Project: Forrest
        Type: Bug
    Versions: 0.7    
 Environment: PowerPC Linux/IBM j2se.1.4.2, x86 Linux/Sun j2se1.5
    Reporter: Børre Gaup
    Priority: Minor


Non-ascii characters gets garbled, "á" becomes "√°", and ø becomes "√∏". and so on.
It is the same phenomenon as described in FOR-667 (http://issues.apache.org/jira/browse/FOR-667), but in another setting.
These kinds of documents work using Mac OS X with built-in java.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (FOR-668) UTF-8 encoded .ihtml documents gives garbled output

Posted by "Tim Williams (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/FOR-668?page=all ]

Tim Williams updated FOR-668:
-----------------------------

      Component: Core operations
                 Documentation and website
    Description: 
Non-ascii characters gets garbled, "á" becomes "?°", and ø becomes "??". and so on.
It is the same phenomenon as described in FOR-667 (http://issues.apache.org/jira/browse/FOR-667), but in another setting.
These kinds of documents work using Mac OS X with built-in java.

  was:
Non-ascii characters gets garbled, "á" becomes "?°", and ø becomes "??". and so on.
It is the same phenomenon as described in FOR-667 (http://issues.apache.org/jira/browse/FOR-667), but in another setting.
These kinds of documents work using Mac OS X with built-in java.


If anyone can confirm that this solution is a preferred one rather than a workaround we should properly document this and close it as an issue.

> UTF-8 encoded .ihtml documents gives garbled output
> ---------------------------------------------------
>
>          Key: FOR-668
>          URL: http://issues.apache.org/jira/browse/FOR-668
>      Project: Forrest
>         Type: Bug
>   Components: Documentation and website, Core operations
>     Versions: 0.7
>  Environment: PowerPC Linux/IBM j2se.1.4.2, x86 Linux/Sun j2se1.5
>     Reporter: Børre Gaup
>     Priority: Minor

>
> Non-ascii characters gets garbled, "á" becomes "?°", and ø becomes "??". and so on.
> It is the same phenomenon as described in FOR-667 (http://issues.apache.org/jira/browse/FOR-667), but in another setting.
> These kinds of documents work using Mac OS X with built-in java.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (FOR-668) UTF-8 encoded .ihtml documents gives garbled output

Posted by "Børre Gaup (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/FOR-668?page=comments#action_12331700 ] 

Børre Gaup commented on FOR-668:
--------------------------------

Thank you, that solved it nicely.

> UTF-8 encoded .ihtml documents gives garbled output
> ---------------------------------------------------
>
>          Key: FOR-668
>          URL: http://issues.apache.org/jira/browse/FOR-668
>      Project: Forrest
>         Type: Bug
>     Versions: 0.7
>  Environment: PowerPC Linux/IBM j2se.1.4.2, x86 Linux/Sun j2se1.5
>     Reporter: Børre Gaup
>     Priority: Minor

>
> Non-ascii characters gets garbled, "á" becomes "?°", and ø becomes "??". and so on.
> It is the same phenomenon as described in FOR-667 (http://issues.apache.org/jira/browse/FOR-667), but in another setting.
> These kinds of documents work using Mac OS X with built-in java.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (FOR-668) UTF-8 encoded .ihtml documents gives garbled output

Posted by "Sjur N. Moshagen (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/FOR-668?page=comments#action_12331513 ] 

Sjur N. Moshagen commented on FOR-668:
--------------------------------------

The last sentence doesn't make much sense, it should read:

The bug appears when running Forrest on MacOS X 10.4 (and 10.3) with default java (1.4.2). I have not tested it with other java versions.

The bug is not seen on (some) Linux configurations.

It appears that the HTML reader (as well as the jspwiki reader, see FOR-667) uses the Java (default) locale, irrespective of any attempts to specify otherwise. And there is no way to tell Forrest to read a (class of) file(s) using a certain encoding.

The HTML reader should obey the charset info in the header of the file.

> UTF-8 encoded .ihtml documents gives garbled output
> ---------------------------------------------------
>
>          Key: FOR-668
>          URL: http://issues.apache.org/jira/browse/FOR-668
>      Project: Forrest
>         Type: Bug
>     Versions: 0.7
>  Environment: PowerPC Linux/IBM j2se.1.4.2, x86 Linux/Sun j2se1.5
>     Reporter: Børre Gaup
>     Priority: Minor

>
> Non-ascii characters gets garbled, "á" becomes "?°", and ø becomes "??". and so on.
> It is the same phenomenon as described in FOR-667 (http://issues.apache.org/jira/browse/FOR-667), but in another setting.
> These kinds of documents work using Mac OS X with built-in java.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (FOR-668) UTF-8 encoded .ihtml documents gives garbled output

Posted by "Miroslav Mocek (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/FOR-668?page=comments#action_12331576 ] 

Miroslav Mocek commented on FOR-668:
------------------------------------

I've solved similar problems with this, maybe it can help:

edit <forrest>/main/webapp/WEB-INF/jtidy.properties
replace
char-encoding=latin1
with
char-encoding=utf8 



> UTF-8 encoded .ihtml documents gives garbled output
> ---------------------------------------------------
>
>          Key: FOR-668
>          URL: http://issues.apache.org/jira/browse/FOR-668
>      Project: Forrest
>         Type: Bug
>     Versions: 0.7
>  Environment: PowerPC Linux/IBM j2se.1.4.2, x86 Linux/Sun j2se1.5
>     Reporter: Børre Gaup
>     Priority: Minor

>
> Non-ascii characters gets garbled, "á" becomes "?°", and ø becomes "??". and so on.
> It is the same phenomenon as described in FOR-667 (http://issues.apache.org/jira/browse/FOR-667), but in another setting.
> These kinds of documents work using Mac OS X with built-in java.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira